<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantics-Based Composition of EMBOSS Services with Bio-jETI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna-Lena Lamprecht</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Naujokat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernhard Ste en</string-name>
          <email>bernhard.steffeng@cs.tu-dortmund.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiziana Margaria</string-name>
          <email>tiziana.margaria@cs.uni-potsdam.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical University Dortmund, Chair for Programming Systems</institution>
          ,
          <addr-line>Dortmund, D-44227</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University Potsdam, Chair for Service and Software Engineering</institution>
          ,
          <addr-line>Potsdam, D-14482</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bio-jETI is a framework for model-based, graphical design, execution and management of bioinformatics analysis processes. Formal methodology like automatic service composition extends the framework and, in particular, allows for semantically aware work ow development. In this study we apply the work ow synthesis methodology to the EMBOSS suite of sequence analysis tools. As neither the tool suite itself nor its various interfaces provide ready-to-use semantic annotations, we set up a domain model that uses a high-level, semantically meaningful type nomenclature to describe the input/output behavior of the single EMBOSS tools. Based on this domain model, we demonstrate how working with the large, heterogeneous, and hence manually intractable EMBOSS collection is simpli ed by our service composition methodology.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Research projects in modern molecular biology rely on increasingly complex
combinations of computational methods to handle the data that is produced in the
life science laboratories. A variety of bioinformatics databases, algorithms and
tools is available for speci c analysis tasks. Their combination to solve a speci c
biological question de nes more or less complex analysis work ows or processes.
Software systems that facilitate their systematic development and automation
have found a great popularity in the community.</p>
      <p>
        More than in other domains the heterogeneous services world in
bioinformatics demands for a methodology to classify and relate resources in a both
human and machine accessible manner. The Semantic Web, which is meant to
address exactly this challenge, is currently one of the most ambitious projects
in computer science. Collective e orts for modeling the bioinformatics domain
have already led to a basis of standards for semantic service descriptions and
meta-information. More concretely, the corresponding state of the art can be
characterized as follows:
{ Domain Modeling: Has started in particular in the BioMoby project [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
where a number of services has been prepared mainly for supporting
semanticsbased retrieval.
{ Components and Interfaces: A popular example is the EMBOSS suite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a
large collection of diverse tools for a speci c eld of bioinformatics (sequence
analysis) that is already integrated into a common technical interface. That
is, the components are 'wrapped' in order to simplify their use: they work
seamlessly for a number of di erent formats and types, and therefore free
the user from caring about compatibility and type con icts.
{ Design Methodology: There are di erent tools for the graphical development
of analysis processes [3{7], most of them data- ow based and without
connection to semantic modeling. An exception is Bio-jETI [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], which supports
the incorporation of semantically modeled domain information for
controlow oriented process construction.
{ Validation: Bio-jETI is unique in supporting domain-modeling-based veri
cation of processes.
      </p>
      <p>In this paper, we present an extension of the Bio-jETI platform that simpli es
the process development phase (item 3) in order to even reach biologists without
programming background by
{ Extending the currently available domain modeling to comprise the
EM</p>
      <p>
        BOSS suite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
{ Achieving type compatibility beyond prede ned 'compatibility wrappers' by
dynamic mediator synthesis. This allows us to cover also third party
components without any programming e ort.
{ Generalizing Bio-jETI's synthesis technology to support a exible kind of
loose process programming: loosely speci ed components and partially
dened connectors are concretized by ontology-based synthesis.
{ Applying model checking to check global properties of complex (partially
synthesized) processes.
      </p>
      <p>The paper is structured as follows. Section 2 describes the work ow synthesis
technology that is available in Bio-jETI from a user's perspective. In Section 3 we
present the setup of the EMBOSS domain. As neither the tool suite itself nor its
various interfaces provide ready-to-use semantic annotations, we extracted the
relevant user-level semantic information from the tool descriptions and built a
domain model that uses a high-level, semantically meaningful type nomenclature
to describe the input/output behavior of the single EMBOSS tools. Based on
this domain model, we demonstrate in Section 4 how working with the large,
heterogeneous, and hence manually intractable EMBOSS collection is simpli ed
by our service composition methodology. The paper ends with a conclusion and
perspectives for future work in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>Semantics-Based Service Composition in Bio-jETI</title>
      <p>
        Bio-jETI [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]3 is a framework for model-based, graphical design, execution and
management of bioinformatics analysis processes. It has been used in a number
of di erent bioinformatics projects [10{13] and is continuously evolving as new
service libraries and service and software technologies become established.
      </p>
      <p>
        Technically, Bio-jETI is based on the jABC modeling framework [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] as an
intuitive, graphical user interface and the jETI electronic tool integration
platform [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] for dealing with remote services. Using the jABC technology, process
models, called Service Logic Graphs (SLGs) are constructed graphically by
placing process building blocks, called Service Independent Building Blocks (SIBs),
on a canvas and connecting them according to the ow of control. SLGs are
directly executable by an interpreter component, and they can be compiled into
a variety of target languages via the Genesys code generation framework [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>The Bio-jETI Graphical User Interface (GUI) is structured as follows (cf.
Figure 1): The major part of the interface is used for the canvas where the SIBs
are placed and connected to form the SLG (A). The SIB library (B) shows the
available SIBs, whereas the Inspector Pane (C) is used for various GUI elements,
such as global model con gurations, SIB parameter editing, but also task speci c
elements for model-checking, local checking, synthesis, etc. Common structures
like status bar and menu (D) complete the interface. The modeling with the
Bio-jETI framework usually consists of the following steps:
3 http://biojeti.cs.tu-dortmund.de/
1. drag &amp; drop SIBs from the SIB library to the canvas,
2. connect SIBs with edges,
3. assign branch names,
4. de ne one start SIB, the entry point for execution, and
5. directly execute the model using an interpreter plugin (E) (window (F) shows
a window that is opened by the currently executed SIB) or generate
executable code with the Genesys code generation framework.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17 ref9">17, 9</xref>
        ], we presented our approach to semantics-based service
composition in the Bio-jETI platform. By integration of automatic service composition
functionality into an intuitive, graphical process management framework, we
maintain the usability of the latter for semantically aware work ow
development. Furthermore, we can integrate services and domain knowledge from any
kind of heterogeneous resource at any location, and are not restricted to any
semantically annotated services of a particular platform.
      </p>
      <p>We now present PROPHETS4, an extension to the Bio-jETI framework that
seamlessly integrates automatic service composition into the jABC. It enhances
the previous approaches by including more formal methodology, but with less of
it being required for the user to know, thus enabling the system to be used by a
wider range of users. These enhancement are in particular:
{ visualized/graphical semantic domain modeling,
{ loose speci cation within the process model,
{ non-formal speci cation of constraints using natural language templates, and
{ automatic generation of model checking formulas.</p>
      <p>Two roles are designed for using this extension. The domain expert provides
information on available services and semantic classi cation over these services
and their input and output types. The application expert is the one who uses
the available services to model the processes. The following subsections deal with
one of those roles, respectively, starting with the domain expert.
2.1</p>
      <sec id="sec-2-1">
        <title>Domain Modeling</title>
        <p>The domain model essentially consists of service de nitions and their semantic
classi cations. The service de nition enhances the SIBs by meta-information
regarding their input/output behavior. Throughout our framework, types are
represented by symbolic names, thus abstracting from concrete implementations.
So a service is characterized by two subsets of the set of all symbolic type names,
namely input types and output types. The SIBs meta-information is stored in a
separate le within the project directory.</p>
        <p>Furthermore, the services and types can be classi ed using taxonomies. These
taxonomies are expressed as ontologies in OWL format. Although we also provide
a seamlessly integrated graphical editor for these OWL les (see Figure 2), the
domain expert may use any OWL tool according to his personal liking.
4 Process Realization and Optimization Platform using a Human-readable Expression
of Temporal-logic Synthesis</p>
        <p>Finally, there might be domain speci c knowledge like ordering constraints on
services or general compatibility information. This knowledge must be formalized
by the domain expert. Basically there are two possible options to do so: Either
he expresses model checking formulas that must hold for every SLG within the
project or he de nes global constraints that are used for every process synthesis.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Process Design</title>
        <p>After a domain has been set up by the domain expert, it can be used by the
process expert for work ow design. He does not need to provide fully speci ed
processes, because model parts that are marked as incomplete can be
automatically synthesized by the framework. As part of the seamless integration into the
jABC, the new extension concentrates on the usability for non-technical users.
It facilitates incomplete speci cation of processes in an easily accessible way
by introducing loosely speci ed branches, which the synthesis replaces by
concrete solutions. Figure 3 (background) shows an example model using a loosely
speci ed branch (colored red to be distinguishable from normal branches).</p>
        <p>
          Behind the scenes the algorithm requires formal speci cations of the synthesis
problem using a con guration universe and a formula in the temporal logic SLTL
(see [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] for details). Our goal with the here presented approach is to hide this
formal complexity from the user and replace it by intuitive (graphical) modeling
concepts. Furthermore, the actual execution of the synthesis is presented to the
user as a set of wizard windows where he nally can choose the favored solution
from the list of all possible solutions ("Wizard Step 2" in Figure 3).
        </p>
        <p>The synthesis algorithm requires a set of start types as initial state within the
con guration universe. Previously, these start types had to be speci ed
manually. These are now determined automatically according to preceding SIBs using
data- ow analysis methods. The types that are available independently from
the execution path in the model are taken as start types. The input types of a
loosely speci ed branch's goal SIB form the goal types for the synthesis. Both
are implicitly speci ed by the user by marking branches as loosely speci ed.</p>
        <p>As stated above, the synthesis requires constraints that are expressed in the
modal logic SLTL. As we won't expect common process experts to deal with this
formal speci cation, we provide means to express constraints using a system that
is based on templates in natural language. The user chooses a restricting concept
and then simply has to ll in a cloze text with prepared values ("Wizard Step 1"
in Figure 3). The templates can easily be extended to the needs of the speci c
domain. The possible values for the cloze text elds are automatically extracted
from the domain (i.e. module de nition and semantic classi cation).</p>
        <p>The previous subsection already mentioned that the domain expert can de ne
global knowledge in means of model checking formulas to describe properties that
must hold for any model in this domain. In addition to these manually de ned
formulas, our framework can automatically generate formulas that check the type
consistency of the given model. The type usage is considered to be consistent,
if there is no execution path possible that contains a SIB with an input type
that has not been generated as output on this very path. This veri cation is
done by a combination of data- ow analysis (the same as is used for start type
determination) to annotate available types to SIBs and checking locally if every
SIB has all required types (i.e. all input types) annotated that way.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Setting up the EMBOSS Domain</title>
      <p>
        EMBOSS (European Molecular Biology Open Software Suite [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) is a collection
of freely available tools for the molecular biology user community. It contains a
number of small and large programs for a wide range of tasks, such as sequence
alignment, database searches, protein motif identi cation, nucleotide sequence
pattern analysis, and codon usage analysis as well as the preparation of data
for presentation and publication5. As of October 2009, EMBOSS (Release 6.1.0)
consists of around 230 tools, some derived from originally standalone packages.
      </p>
      <p>EMBOSS provides a common look and feel for the diverse tools that are
contained in the suite. They can easily be run from the command line, or accessed
from other programs. Thus, it is also suitable for being set up behind GUIs and
web interfaces. What is more, EMBOSS automatically copes with data in a
variety of formats, even allowing for transparent retrieval of sequence data from
the web. This enables us to focus on the actual service semantics rather than on
technical details of data compatibilities when setting up the domain.</p>
      <p>
        Of the around 230 tools of the complete EMBOSS suite, 175 are currently
integrated in our domain. For presentation in this paper we use a representable
subset of this domain, consisting mainly of the HMMER [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] applications.
HMMER is a software for biosequence analysis using Pro le Hidden Markov Models
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. It contributes 9 applications to EMBOSS, namely ehmmalign, ehmmbuild,
ehmmcalibrate, ehmmconvert, ehmmemit, ehmmfetch, ehmmindex, ehmmpfam,
and ehmmsearch. The pre x 'e' is used to distinguish the EMBOSS
integration from the orginal HMMER programs. In addition to the HMMER tools,
our domain contains the multiple sequence analysis tools emma and edialign,
makeprotseq and makenucseq for the generation of random protein and
nucleotide sequences, respectively, as well as some tools for the display of speci c
data. A complete list of the services in our domain subset is given in Table 1.
      </p>
      <p>Additional structuring of the domain is provided by the classi cation of
types and services in taxonomies, which are simple ontologies that relate
entities in terms of is-a relations. Figure 4 shows the service taxonomy that we
5 http://emboss.sourceforge.net/index.html
de ned for the HMMER subset of our domain. The generic type Thing
(center) represents the root of the taxonomy, underneath which four abstract service
groups are de ned. The abstract group Edit has the services makenucseq and
makeprotseq as instances, the services showseq, showalign and showtext are
classi ed as Display by the taxonomy. Edialign and emma are abstractly
described as AlignmentMultiple, the remaining tools belong to the HMM group.</p>
      <p>The type taxonomy for the subset of the domain is shown in Figure 5. All
services in this subset work on text-based data, thus all available types belong to
the Text group. The di erent Sequence types are distinguished further into the
groups ProteinSequence, NucleotideSequence, and MultipleSequence. Note
that some types are instances of multiple groups: MultipleNucleotideSequence,
for instance, is both a MultipleSequence and NucleotideSequence.</p>
      <p>Currently the service taxonomy for our complete EMBOSS domain contains
the 175 services and 42 abstract groups, which to the most part correspond to
the groups that EMBOSS de nes. The type taxonomy for the complete domain
consists of 135 di erent data types and 11 abstract classi cations. This large
number of concrete data types is due to the fact that several tools that are
integrated in EMBOSS produce speci c tool outputs, often in addition to data
that is formatted in a common format. Although these tool outputs can not
directly be used as input to other tools, they are relevant to the domain, since
it is possible to extract information from them that is suitable as input data.</p>
    </sec>
    <sec id="sec-4">
      <title>Working with the EMBOSS Domain</title>
      <p>In the previous section we described the setup of the EMBOSS domain, which is
the task of the domain expert. In this section, we illustrate the work of the
application expert, who designs the actual analysis processes dealing with particular
biological questions.</p>
      <p>As a rst example we consider the small work ow in Figure 6 (A): it consists
of the services makeprotseq6 and showalign, which are connected by a loosely
speci ed default branch. The synthesis problem that is de ned by the loose
branch is simply given by the output type of makeprotseq, providing the input
type for the synthesized sequence, and the input type of showalign, which is the
type that the synthesized sequence must nally produce. That is, the synthesis
algorithm has to nd a way from MultipleProteinSequence to Alignment.
Obviously, this request can be met by inserting a single multiple sequence alignment
service, for example emma. Figure 6 (B) shows the result.</p>
      <p>A similar synthesis problem is de ned by the process shown in Figure 6 (C),
where the type Sequence must be produced. Obviously, the shortest solution is
an empty service sequence, as makeprotseq already provides a suitable input for
showseq. We might, however, have a process in mind that does some analysis
on the initially generated sequences and produces another set of sequences, for
instance via a Pro le HMM. As already indicated in Section 2, additional
constraints can be used in the work ow speci cation that is given to the synthesis
algorithm. For expressing the sketched case, we can give an additional constraint
6 For simplicity, we let our example processes begin with services that randomly
generate sequences that can be processed further. Note that they can be easily exchanged
by the retrieval of sequences from a public database, or by loading a sequence le.
to the synthesis algorithm that enforces the use of the service ehmmemit. One
of the shortest thus possible processes is given in Figure 6 (D): the initial
input sequences are converted into an Alignment by emma, which is then used by
ehmmbuild to create a Pro le HMM. Ehmmemit emits a set of sequences based
on this HMM that are nally displayed by showseq.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref17 ref9">17, 9</xref>
        ] we showed how model checking techniques can be applied to
monitor global properties of the process models, and used it preliminarily to detect
mismatching data types. However, model checking can also validate higher-level
constraints that are expressed in terms of application-level domain knowledge.
For an example, consider the process in Figure 7 (A). It corresponds to the result
from Figure 6 (D). Now, we might want to calibrate each built HMM before it
actually emits sequences. Formally, this is expressed as
      </p>
      <p>ehmmbuild ) (:ehmmemit WU ehmmcalibrate)
denoting that the use of ehmmbuild implies that ehmmemit is not used before
ehmmcalibrate has been executed. As Figure 7 (A) shows, this requirement is
not met by the previously created process, because the ehmmbuild SIB does not
ful ll the property (indicated by the red overlay icon in the lower right corner of
the SIB). Inserting the ehmmcalibrate service into the work ow xes this issue,
as Figure 7 (B) shows: all SIBs are marked by a green icon. Naturally, and as
(C) shows, this constraint is also ful lled if the HMM is not built by the process,
but fetched from an HMM database.</p>
      <p>As a third and nal example in this paper, we discuss the process that we
already showed in Figure 3 to illustrate the use of the synthesis plugin, which
shows a process that does not (yet) contain any EMBOSS services. A
(nucleotide) sequence is fetched from the DNA Data Bank of Japan, and used for a
BLAST search against a protein database. The Uniprot IDs are extracted from
the BLAST result and then processed in a loop that fetches the Uniprot entry
for this ID. The remainder of the loop body is a loosely speci ed branch, to be
concretized by an appropriate sequence of services. The synthesis plugin has
access to both the EMBOSS and the DDBJ domain model and can transparently
combine services from both sources.</p>
      <p>For this example, we use the complete EMBOSS domain to nd an
appropriate sequence of services that does something with the protein sequence that is
retrieved within the loop. If we start the synthesis with no further constraints,
several thousand possible solutions are found, even if the length of the solution
is limited. The reason lies in the nature of the EMBOSS domain: many tools
work on the same input type (sequence), some again producing sequences, so
that if the synthesis is only based on the type information, unfathomable many
variations of solutions are possible. This shows that a adequate domain modeling
requires to incorporate as much domain knowledge as possible, far beyond the
mere technical aspects of the di erent types and services.</p>
      <p>In order to get less, but more reasonable results, we can now formulate some
additional constraints for the synthesis. For instance, we might want the sequence
to end with a Display service, displaying the available data directly or applying
some analysis to the sequence and then displaying the result of the analysis. This
can be expressed using formula templates in the synthesis wizard (see Figure 3,
Wizard Step 1). The list of solutions that is o ered by the wizard is still long
(100 out of 653 are displayed, see Figure 3, Wizard Step 2), but the proposed
work ows now meet the intention of the process developer.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        Bio-jETI is a framework for model-based, graphical design, execution and
management of bioinformatics analysis processes. Formal methodology like
automatic service composition extends the framework and, in particular, allows for
semantically aware work ow development [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this study we applied the
workow synthesis methodology to the EMBOSS suite of sequence analysis tools. As
neither the tool suite itself nor its various interfaces provide ready-to-use
semantic annotations, we set up a domain model that uses a high-level, semantically
meaningful type nomenclature to describe the input/output behavior of the
single EMBOSS tools. Based on this domain model, we demonstrated how working
with the large, heterogeneous, and hence manually intractable EMBOSS
collection is simpli ed by our service composition methodology.
      </p>
      <p>
        The challenge of semantics-based service composition in the bioinformatics
application domain has also been addressed by a number of other projects. For
instance, the BioMoby project provides a composition functionality for its
services. With the MOBY-S Web Service Browser [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] it is possible to search for an
appropriate next service and store the sequence of executed tools as a Taverna
work ow. Similarly, the REMORA web server [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] o ers functionality for the
discovery and step-by-step composition of BioMoby services and the DDBJ's
Web API for biology provides next applicable services according to the outputs
of previously executed services [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Another example is the approach taken
by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], who link meaningful terms from the text of a web page to executable
web services, thereby automatically creating work ows that are suitable within
the current context.
      </p>
      <p>Bio-jETI is unique in its holistic perspective, which covers both the scope of
the process modeling as well as the coverage of individual services and platforms:
{ Process development is addressed from a goal-oriented global perspective.</p>
      <p>Our loose programming concept allows the user to describe the actually
intended work ow as a whole, and the synthesis nds shortest solutions directly
matching the global intent. In contrast, the automatic service-composition
functionality of the approaches mentioned above is limited to small
subwork ows or even single steps of the analysis process, which come with the
risk that users get stuck when stepwisely trying to construct the globally
intended solution. Especially for collections like EMBOSS, where many tools
work on the same data types, a mere local discovery of services is not
productive with respect to the construction of a multi-step work ow.
{ Due to the integration into the jABC framework and the decoupled
speci cation of service descriptions, any kind of heterogeneous resource at any
location can be integrated. There is no restriction to semantically annotated
services of a particular platform. On the contrary, any service that is
available as a jABC component can be enhanced by a semantic service description
and will immediately be available for synthesis of Bio-jETI processes.</p>
      <p>
        Furthermore, Bio-jETI scores with the seamless, user-friendly integration of
the domain modeling and synthesis methodology into a graphical process
management framework, which, due to the loose programming paradigm, enables
application experts to describe their desires in a way that can be
automatically transformed into running solutions. Other approaches to automatic service
composition that we are aware of require their users to work on a far more
technical level. For instance, [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] describe a framework for the composition of
data work ows where a domain ontology is modeled in a rst-order logic
language, and relational data descriptions by formulas over concepts and relations
of the ontology. A similar amount of familiarization is required for GOLOG [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ],
which extends the ALGOL programming language by elements of the Situation
Calculus.
      </p>
      <p>
        All approaches to (semi-) automatically dealing with the large number of
distributed, heterogeneous services that are availablein the bioinformatics
application domain share the di cult task of nding or de ning semantically
appropriate service and type descriptions [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Projects like the (my)Grid
Ontology [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], BioCatalogue [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], BioMoby [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and SSWAP [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] address this issue by
providing knowledge bases that particularly capture bioinformatics data types
and services. We plan to integrate their services and domain knowledge in the
scope of future case studies. The resulting domains will contain far more
heterogeneous services than the comparatively 'closed' EMBOSS domain that we
used for the current study, creating new challenges for the client-side software,
challenges that Bio-jETI is designed for.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Links</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>BioMOBY: an open source biological web services proposal</article-title>
          .
          <source>Brie ngs in Bioinformatics</source>
          <volume>3</volume>
          (
          <issue>4</issue>
          ) (
          <year>December 2002</year>
          )
          <volume>331</volume>
          {
          <fpage>341</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rice</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Longden</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bleasby</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>EMBOSS: the European Molecular Biology Open Software Suite. Trends in Genetics: TIG</source>
          <volume>16</volume>
          (
          <issue>6</issue>
          ) (
          <year>June 2000</year>
          )
          <volume>276</volume>
          {
          <fpage>267</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bausch</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pautasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonso</surname>
          </string-name>
          , G.:
          <article-title>BioOpera: Cluster-aware Computing</article-title>
          .
          <source>In: Proceedings of the 4th IEEE International Conference on Cluster Computing (Cluster</source>
          . (
          <year>2002</year>
          )
          <volume>99</volume>
          {
          <fpage>106</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Eker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janneck</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>Taming heterogeneity - the Ptolemy approach</article-title>
          .
          <source>Proceedings of the IEEE 91(1)</source>
          (
          <year>2003</year>
          )
          <volume>127</volume>
          {
          <fpage>144</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Altintas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berkley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaeger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>Kepler: An Extensible System for Design and Execution of Scienti c Work ows</article-title>
          .
          <source>In SSDBM</source>
          (
          <year>2004</year>
          )
          <volume>21</volume>
          {
          <fpage>23</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Oinn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Addis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferris</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Taverna: a tool for the composition and enactment of bioinformatics work ows</article-title>
          .
          <source>Bioinformatics</source>
          <volume>20</volume>
          (
          <issue>17</issue>
          ) (
          <year>2004</year>
          )
          <volume>3045</volume>
          {
          <fpage>3054</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , I.,
          <string-name>
            <surname>Shields</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Triana Work ow Environment: Architecture and Applications</article-title>
          . In: Work ows for e-Science. Springer, New York, Secaucus, NJ, USA (
          <year>2007</year>
          )
          <volume>320</volume>
          {
          <fpage>339</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubczak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <article-title>Bio-jETI: a service integration, design, and provisioning platform for orchestrated bioinformatics processes</article-title>
          .
          <source>BMC Bioinformatics 9 Suppl</source>
          <volume>4</volume>
          (
          <year>2008</year>
          ) S12
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lamprecht</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <article-title>Bio-jETI: a framework for semanticsbased service composition</article-title>
          .
          <source>BMC Bioinformatics 10(Suppl 10)</source>
          (
          <year>2009</year>
          ) S8
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubczak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Njoku</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <article-title>Model-based Design of Distributed Collaborative Bioinformatics Processes in the jABC</article-title>
          .
          <source>In: Proceedings of ICECCS, IEEE Computer Society</source>
          (
          <year>2006</year>
          )
          <volume>169</volume>
          {
          <fpage>176</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kubczak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fritsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <string-name>
            <surname>Biological</surname>
            <given-names>LC</given-names>
          </string-name>
          /MS Preprocessing and
          <article-title>Analysis with jABC, jETI and xcms</article-title>
          .
          <source>In: Second International Symposium on Leveraging Applications of Formal Methods, Veri cation and Validation</source>
          ,
          <year>2006</year>
          . ISoLA
          <year>2006</year>
          . (
          <year>2006</year>
          )
          <volume>303</volume>
          {
          <fpage>308</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lamprecht</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B., et al.:
          <article-title>GeneFisher-P: variations of GeneFisher as processes in Bio-jETI</article-title>
          .
          <source>BMC Bioinformatics 9 Suppl</source>
          <volume>4</volume>
          (
          <year>2008</year>
          ) S13
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lamprecht</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <article-title>Seven Variations of an Alignment Workow - An Illustration of Agile Process Design and Management in Bio-jETI</article-title>
          .
          <source>In: Bioinformatics Research and Applications</source>
          . Volume
          <volume>4983</volume>
          of LNBI.,
          <string-name>
            <surname>Atlanta</surname>
          </string-name>
          , Georgia, Springer (
          <year>2008</year>
          )
          <volume>445</volume>
          {
          <fpage>456</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Ste en,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Margaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Nagel</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          , et al.:
          <article-title>Model-Driven Development with the jABC</article-title>
          .
          <source>In: Hardware and Software, Veri cation and Testing</source>
          . (
          <year>2007</year>
          )
          <volume>92</volume>
          {
          <fpage>108</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Ste en, B.:
          <article-title>jETI: A Tool for Remote Tool Integration</article-title>
          . In:
          <article-title>Tools and Algorithms for the Construction and Analysis of Systems</article-title>
          . Volume
          <volume>3440</volume>
          /2005 of LNCS., Springer Berlin/Heidelberg (
          <year>2005</year>
          )
          <volume>557</volume>
          {
          <fpage>562</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Jorges,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Margaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Ste</surname>
          </string-name>
          <string-name>
            <surname>en</surname>
          </string-name>
          , B.:
          <article-title>Genesys: service-oriented construction of property conform code generators</article-title>
          .
          <source>Innovations in Systems and Software Engineering</source>
          <volume>4</volume>
          (
          <issue>4</issue>
          ) (
          <year>December 2008</year>
          )
          <volume>361</volume>
          {
          <fpage>384</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lamprecht</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margaria</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ste</surname>
            <given-names>en</given-names>
          </string-name>
          , B.:
          <article-title>Supporting Process Development in BiojETI by Model Checking and Synthesis</article-title>
          .
          <source>In: Proc. of 1st Workshop SWAT4LS08</source>
          , Edinburgh, United Kingdom,
          <source>CEUR Workshop Proceedings (November</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Ste en,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Margaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Module con guration by minimal model construction</article-title>
          . (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Eddy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Pro le hidden Markov models</article-title>
          .
          <source>Bioinformatics</source>
          (Oxford, England)
          <volume>14</volume>
          (
          <issue>9</issue>
          ) (
          <year>1998</year>
          )
          <volume>755</volume>
          {
          <fpage>763</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Eddy</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>HMMER: biosequence analysis using pro le hidden markov models</article-title>
          . http://hmmer.janelia.org/
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Dibernardo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pottinger</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semi-automatic web service composition for the life sciences using the BioMoby semantic web framework</article-title>
          .
          <source>Journal of Biomedical Informatics (March</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Carrere</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gouzy</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>REMORA: a pilot in the ocean of BioMoby web-services</article-title>
          .
          <source>Bioinformatics</source>
          (Oxford, England)
          <volume>22</volume>
          (
          <issue>7</issue>
          ) (
          <year>April 2006</year>
          )
          <volume>900</volume>
          {901 PMID:
          <fpage>16423924</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Kwon</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigemoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuwana</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugawara</surname>
          </string-name>
          , H.:
          <article-title>Web API for biology with a work ow navigation system</article-title>
          .
          <source>Nucl. Acids Res</source>
          .
          <volume>37</volume>
          (
          <issue>suppl 2) (</issue>
          <year>July 2009</year>
          )
          <volume>W11</volume>
          {
          <fpage>16</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sutherland</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLeod</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferguson</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Knowledge-driven enhancements for task composition in bioinformatics</article-title>
          .
          <source>BMC Bioinformatics 10(Suppl 10)</source>
          (
          <year>2009</year>
          ) S12
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapoor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Automatically composing data work ows with relational descriptions and shim services</article-title>
          .
          <source>In: The Semantic Web</source>
          . (
          <year>2008</year>
          )
          <volume>15</volume>
          {
          <fpage>29</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Levesque</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lesperance</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.:
          <article-title>GOLOG: a logic programming language for dynamic domains</article-title>
          .
          <source>Journal of Logic Programming</source>
          <volume>31</volume>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Lord</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bechhofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.D.</surname>
          </string-name>
          , et al.: Applying Semantic Web Services to Bioinformatics: Experiences Gained,
          <article-title>Lessons Learnt</article-title>
          .
          <source>In: The Semantic Web ISWC</source>
          . (
          <year>2004</year>
          )
          <volume>350</volume>
          {
          <fpage>364</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alper</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hull</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>The (my)Grid ontology: bioinformatics service discovery</article-title>
          .
          <source>International Journal of Bioinformatics Research and Applications</source>
          <volume>3</volume>
          (
          <issue>3</issue>
          ) (
          <year>2007</year>
          )
          <volume>303</volume>
          {
          <fpage>325</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belhajjame</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanoh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , et al.:
          <string-name>
            <surname>BioCatalogue: A Curated Web Service Registry For The Life Science Community</surname>
          </string-name>
          (
          <year>April 2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Gessler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>SSWAP - Simple Semantic Web Architecture and Protocol</article-title>
          . http://sswap.info/docs/SSWAP.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>