Techniques to Produce Good Web Service
        Compositions in The Semantic Grid

                                 Eduardo Blanco

                          Universidad Simón Bolı́var,
          Departamento de Computación y Tecnologı́a de la Información,
                  Apartado 89000, Caracas 1080-A, Venezuela
                           {eduardo}@ldc.usb.ve
                Advisors: Yudith Cardinale and Marı́a Esther Vidal


      Abstract. The Grid has emerged as a new distributed computing in-
      frastructure for advanced science and engineering. It aims at enabling
      resource sharing as means to facilitate a problem-solving approach for
      dynamic environments that is based on the integration of available re-
      sources. Wide agreement that a flexible Grid is not possible without
      the support of semantic technologies, has lead to the term “Semantic
      Grid“. Transparent service composition has a great potential to facilitate
      the integration of Web Services deployed in the Grid. However, discov-
      ering, dynamically composing and optimizing large-scale Web Services
      (e.g., in the rank of 1,000 to 100,000 services) is a challenging problem.
      We approached the problem of Web Service compositions in the Grid
      as an optimization problem, in which for a given request a good plan
      has to be generated. In this work, we propose the definition of several
      optimization strategies to identify good orderings of Web Service com-
      positions.


1   Research Problem
The Grid provides the protocols, services, and software development kits needed
to enable flexible, controlled resource sharing on a large scale. Automatic discov-
ery and coordination of services in the Grid is not only desirable, but necessary.
Users need tools that assist them on taking full advantage of the possible large
number of available services. Any form of automated composition needs an infras-
tructure where all resources are adequately described in a machine-processable
form. Semantic Web technologies provide the means to incorporate such de-
scriptions to the Grid infrastructure leading to the Semantic Grid. Thus, it is
possible to apply Semantic Web technologies in grid-computing developments.
    Numerous efforts have focused on the task of service discovery and Web Ser-
vice composition (the WSC problem). However, few have addressed the problem
of identifying efficient Web Service compositions in a large-scale scenario, as
the one presented in the Grid vision [1]. In such scenarios, the association
among services may be extremely complex, and the search space of possible
Web Service compositions could be very large. In consequence, the task of
2       E. Blanco

finding an optimal plan could be not feasible in real time; thus, our objective is
to find sub-optimal plans in a reasonable time.
    In Grid platforms requests are usually scheduled as batch processes, and
real time constraints are relaxed. Thus, it is possible to do a greater exploration
of the space of possible Web Service compositions, while the batch process is
waiting to be evaluated.
    On the other hand, some Grid resources can only be accessed at certain
times. It may be required to incorporate into the space of alternatives, the possi-
bility of executions where intermediate results are stored and retrieved according
to time restrictions. In other words, all services used by a plan do not need to
be available at the same time.
    In this work, we propose the definition of optimization strategies to iden-
tify good orderings of Web Service compositions in large-scale Web Ser-
viceenvironments, e.g., in the rank of 1,000 to 100,000 available services. These
strategies will follow different meta-heuristics to explore the space of possibili-
ties, while minimizing the estimated evaluation cost. We have already defined
two strategies that prune the space of possibilities. Our initial results show that
the compositions identified by our algorithms, are close to the optimal solution.


2   Related Work

The problem of identifying a good WSC in the Grid is related to problems that
have been studied in query optimization and WSC. In this section we consider the
main related approaches in each of these areas.
    In the context of the Web, several strategies have been presented to identify
good evaluation plans where sources have limited query processing capabilities;
then, the optimizer task is to identify a good ordering of the sub-goals of a query
where limited processing capabilities of each considered source, are satisfied [2,
3].
    Typically, existing approaches achieve the challenge of identifying good plans
by representing query capabilities as binding patterns and using these patterns
and meta-heuristics to traverse the space of source plans. Each sub-goal in the
query is rewritten in the plan by using sources that define the sub-goal; limited
processing capabilities of the sources are satisfied in the plan with query bindings
or attributes projected out by previous sources in the plan.
    The WSC problem has been extensively treated in the literature, and diverse
solutions that take advantage of AI techniques and Search Meta-Heuristics, have
been proposed. First, in the context of AI, the WSC problem has been represented
as a planning problem where actions to be taken by the planner are defined in
terms of service preconditions and effects.
    The description of a planning domain includes a set of planning operators and
methods that establish the way a task can be decomposed into smaller subtasks.
The description of a planning problem contains an initial state as in classical
planning. Instead of a goal formula, there is a partially-ordered set of tasks to
accomplish. Planning proceeds by decomposing tasks recursively into smaller and
                             Good Web Service Compositions in the Grid           3

smaller subtasks, until primitive tasks, which can be performed directly by using
one planning operator, are reached. For non-primitive tasks, the planner chooses
an applicable method, instantiates it to decompose the task into subtasks, and
then chooses and instantiates other methods to decompose the subtasks even
further. If the constraints on the subtasks or the interactions among them prevent
the plan from being feasible, the planning system backtracks and tries other
methods.
   As any planning problem, the approach presented in [4], requires the for-
malization of the domain-dependent control knowledge in the planner. Thus,
a domain expert is needed in order to achieve good performance in real-world
domains.
    An approach that uses Answer Set Programming is presented in [5]. It shows
that service descriptions can be expressed in a rule based language that allows
to search a repository efficiently and to build solutions that solve a goal with
respect to soft and hard constrain. The author reports that the solution performs
very well in this rather simple domain. It defines a simple cost function instead
of a utility function. Its strength is that it provides means to gain all solutions
for a given problem despite the cost of computation time and space. He states
that dedicated software employing fast heuristic algorithms could rapidly find a
good solution for user requests in much reasonable times.
    In the context of Search Meta-Heuristics, the SAM (Service Aggregation
Matchmaking) algorithm is defined [6]. It makes use of an OWL-S ontology,
and explicitly returns a sequence of atomic processes that need to be executed
in order to achieve the desired result.
    SAM follows a greedy approach in which only one sub-plan is generated in
each iteration. In each sub-plan, a sub-set of the output attributes is produced
considering some of the bindings given in the query. The algorithm ends when
all the output attributes are produced. In terms of time, SAM is able to scale
up in environments with a moderate number of services (e.g., in the rank of
100 to 200 services). However, since SAM does not consider any cost metric or
optimization criteria to compose the services, plans produced by SAM may be
costly. To exacerbate this problem, SAM may add processes to the plan that
are not needed to produce the output required by the user. Thus, the quality of
generated plans may be far from optimal.
   Semantic Grid [7] is an infrastructure where it is possible to apply Se-
mantic Web technologies in grid-computing developments [8, 9]. In the context
of Semantic Grid, the project ARGUGRID [10] provides a new model for
programming the Grid; it uses argumentative agent technology and semantic
descriptions to facilitate the dynamic composition of services.
   Finally, in [11], the WSC problem is defined as a planning problem where a
workflow is generated automatically. It takes as inputs the desired data products,
and the planner uses heuristic control rules to identified a high-quality solution.
Plan quality is measured with respect to a global utility function that does not
represent the dynamic properties of Grid environments.
4        E. Blanco

3     Contributions
We hypothesize that if there is a large number of Web Services published in
different sites then, services’ performance may vary. Thus, it is imperative that
solutions to the WSC problem consider an estimate of the service evaluation cost.
This cost is used to prune the space of possibly costly plans of services, and to
identify the service composition with the lowest estimated cost.
    We propose solutions to the WSC problem for large-scale platforms, e.g.,
Grids. These solutions will be adapted to consider the dynamic characteristic
presented in such platforms. In particular, we address the problem of selecting
and coordinating services that satisfy some of the constraints presented in the
Grid, while the evaluation time is minimized.
    Plans will be generated considering the time constraints of Grid resources.
Thus, we will incorporate into the space of alternatives, the possibility of exe-
cutions where intermediate results are stored and retrieved according to these
constraints.


4     Evaluation
To validate the quality of our techniques, we plan to conduct experimental stud-
ies that compare our approach against some of the existing projects. We propose
to use at least two available test sets - EEE05 and ICEBE051 .
    We will also analyze the performance of our solutions in real-world scenarios.
For this, we will generate a large number of ontological Web Service descrip-
tions for a given scientist community. WSBen[12] is a tool able of automatically
generate sets of WSDL service descriptions. WSBen is inspired by extensive studies
on real Web Services and therefore, it is designed to support various network
topologies and distributions. We will extend this tool to generate ontological
service descriptions.


5     Work Plan
We consider that the project can be achieved within the following stages:

1. We propose approaches that adapt some optimization techniques to the WSC
   problem. We have already defined two extensions to the algorithm SAM.
   Initial results show that the extensions are effective. In terms of plan quality
   and optimization time, our generated plans, outperform results produced by
   SAM [13].
2. We will extend WSBen to generate large sets of ontological Web Service
   descriptions.
3. We will propose cost models and test them with the generated sets; we will
   study the behavior of our approaches using different distributions and Web
   Service inter-relations.
1
    They are available at http : //www.comp.hkbu.edu.hk/ ctr/wschallenge/news.html#dataset
                               Good Web Service Compositions in the Grid             5

 4. We will evaluate existing strategies proposed to discover and compose ser-
    vices in The Semantic Grid.
 5. We will enrich the service composition algorithm and the cost models with
    the experiences on Semantic Grid. In this sense, it is mainly important to
    consider the dynamic characteristic of grid platforms.


References
 1. Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scal-
    able Virtual Organizations. International Journal of High Performance Computing
    Applications 15(3) (2001)
 2. Levy, A., Rajaraman, A., Ordille, J.: Querying heterogeneous information sources
    using source descriptions. In: Proceedings of the VLDB Conference. (1996)
 3. Papakonstantinou, Y., Gupta, A., Haas, L.: Capabilities-based query rewriting in
    mediator systems. In: Conf. on Parallel and Distributed Inform. Systems. (1996)
 4. Kuter, U., Sirin, E., Nau, D.S., Parsia, B., Hendler, J.A.: Information gathering
    during planning for web service composition. In: Intl. Semantic Web Conf. (2004)
    335–349
 5. Rainer, A.: Web service composition using answer set programming. In: PuK 2005.
    (2006)
 6. Brogi, A., Corfini, S., Popescu, R.: Composition-oriented service discovery. In:
    Proc. of Software Composition’05, LNCS. Volume 3628. (2005) 15–30
 7. De Roure, D., Jennings, N., Shadbolt, N.: The semantic grid: A future e-science
    infrastructure. In Berman, F., Fox, G., Hey, A.J.G., eds.: Grid Computing - Making
    the Global Infrastructure a Reality. John Wiley and Sons Ltd. (2003) 437–470
 8. Goble, C., Roure, D.D.: The grid: an application of the semantic web. SIGMOD
    Rec. 31(4) (2002) 65–70
 9. Pahl, C.: An ontology-based framework for semantic grid service composition. Grid
    Services Engineering and Management 3270 (2004) 63–77
10. Programme, E.C.S.F.: ARGUGRID (2006) http://www.argugrid.org/.
11. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Lazzarini,
    A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows
    onto grid environments. Journal of Grid Computing, Vol. 1, No. 1, (2003)
12. Oh, S.C., Kil, H., Lee, D., Kumara, S.R.T.: Wsben: A web services discovery and
    composition benchmark. In: ICWS ’06: Proceedings of the IEEE International
    Conference on Web Services (ICWS’06), Washington, DC, USA, IEEE Computer
    Society (2006) 239–248
13. Blanco, E., Cardinale, Y., Vidal, M.E., Graterol, J.: Techniques to produce optimal
    Web Service Compositions. In: 2008 IEEE Congress on Services 2008 - Part I
    (SERVICES-1 2008), Honolulu, Hawaii, USA, IEEE Computer Society (2008) To
    appear.