<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semi-automatic generation of Semantic Web Services for relational biological databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julien Wollbrett</string-name>
          <email>julien.wollbrett@cirad.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Larmande</string-name>
          <email>pierre.larmande@ird.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Ruiz</string-name>
          <email>manuel.ruiz@cirad.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIRAD, UMR AGAP</institution>
          ,
          <addr-line>F-34398 Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IRD, UMR DIADE</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, a large amount of “-omics” data has been produced. However, these data are stored in many different species-specific databases that are managed by different institutes and laboratories. Biologists often need to find and assemble data from disparate sources to perform certain analyses. Searching for these data and assembling it is a time-consuming task. The Semantic Web helps to facilitate interoperability across databases. A common approach involves the development of wrapper systems that map a relational database schema onto existing domain ontologies. However, few attempts have been made to automate the creation of such wrappers. We developed a framework, named BioSemantic, for the creation of Semantic Web Services applicable to relational biological databases. This framework makes use of both Semantic Web and Web Services technologies and can be divided into two main parts: (i) the generation and semi-automatic annotation of an RDF view; and (ii) the automatic generation of Semantic Web Services. We have used our framework to integrate genomic data from different plant databases. BioSemantic is a framework designed to speed the development of Semantic Web Services for existing relational biological databases. Currently, it creates and annotates RDF views that enable the automatic generation of SPARQL queries. Web Services are also created and deployed automatically, and the semantic annotations of our Web Services are added automatically using SAWSDL attributes. BioSemantic is downloadable at http://southgreen.cirad.fr/?q=content/Biosemantic.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web Services</kwd>
        <kwd>ontology driven data integration</kwd>
        <kwd>SPARQL query formulation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We will detail below the entire process for generating a BioSemantic SWS, which can be divided into two
main parts: (i) the generation and semi-automatic annotation of an RDF view (Fig. 1); and (ii) the automatic
generation of the Semantic Web Service (Fig. 2).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Generation and semi-automatic annotation of an RDF View</title>
      <p>A local RDF view of the database schema is automatically created for each relational database to be
integrated. Then the RDF view has to be manually annotated by experts with terms from existing bio-ontologies.
The RDF views, both created and annotated, are stored in a RDF repository (Fig 1).
2.1</p>
      <sec id="sec-2-1">
        <title>Relational database-to-RDF mapping</title>
        <p>
          The research in the domain of mapping between databases and ontologies is very active and corresponds to
various motivations and approaches [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In BioSemantic, we use the mapping as an intermediate layer between
the user and the stored data. This layer provides an abstraction of the database and allows the user to query
databases without knowledge of the database schema. These characteristics correspond to the motivation known
as “data access based on ontology”. For that purpose, we found only two tools that strictly use Semantic Web
standards: Virtuoso [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and D2RQ [
          <xref ref-type="bibr" rid="ref10 ref8 ref9">8–10</xref>
          ]. We have chosen D2RQ because this tool is open source and free. In
addition, some bioinformatics projects have successfully used D2RQ. With D2RQ, we can automatically
generate a mapping file that provides an RDF view of the database schema.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>RDF view description</title>
        <p>The RDF view generated by D2RQ contains the elements of the database schema: entities, attributes, keys
(primary, foreign) and metadata, such as the database driver and host. The data contained in the relational
databases are not included in the RDF view. Consequently, both the D2RQ API and the RDF view are requested
when the data are accessed through SPARQL queries.</p>
        <p>In the RDF view, the database schema is represented by a graph. Each node corresponds to an entity or
attribute in the database, and each edge defines a relationship between two nodes. In RDF format, namespaces
are used to uniquely identify each node. Namespaces provide a prefix for each node name. For example, the
map:marker node indicates the “marker” concept from the “map” vocabulary.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Automatic semantic enrichment of the RDF view with BioSemantic</title>
        <p>The BioSemantic API automatically detects specific information related to the relational database schema and
translates it into new properties that can be integrated into the RDF view. These metadata are then used for
SPARQL query generation. This step can be seen as a semantic enrichment of the RDF view.</p>
        <p>For this purpose, we have developed an algorithm that detects association tables:</p>
      </sec>
      <sec id="sec-2-4">
        <title>Association tables</title>
        <p>pk= primary key of R
fk= foreign keys of R
}
}</p>
        <p>R is an association table
{
{</p>
        <p>We can also detect the arity of association tables, i.e., the number of foreign keys they possess. The algorithm
labels association tables in the RDF view with the dr:associatedTo property and indicates the arity with the
dr:arity property .</p>
      </sec>
      <sec id="sec-2-5">
        <title>Inheritance, aggregation and composition</title>
        <p>
          There are many ways to transform inheritance relationships from an object-oriented conceptual model to a
relational model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. For our algorithm, we detect relationships resulting from the transformation of each class
in an inheritance hierarchy into a table. We also detect tables resulting from aggregation or composition
relationships by using the identifying algorithm from [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. We label these relationships in the RDF view with the
rdf:subClassOf property.
        </p>
        <p>The annotation of the RDF view is performed manually using a text editor and must be conducted by an
expert familiar with the database and/or bio-ontology.</p>
        <p>Semantic annotations are used to select inputs and outputs of a query. We are able to find a path in one RDF
view by linking the inputs to the outputs. If such a path is found in the RDF view, it is used to create a SPARQL
query. To automate the creation of SPARQL queries, we implement an algorithm that is a single-pair variant of
the shortest-path algorithm. Given an input graph, a source node and a destination node, it returns a path linking
the two nodes through the graph. We add conditions to our shortest-path algorithm according to the types of
relationships between the nodes, which can be either of the following: (i) relationships corresponding to
association tables; or (ii) relationships resulting from inheritance, aggregation or composition in an
objectoriented conceptual model. These conditions correspond to the metadata added to the RDF view during the
automatic semantic enrichment step by the BioSemantic API.</p>
        <p>The Web Services developer selects the bio-ontological terms to be used as input/output (Fig. 2). All of the
mapping files, which are stored in the mapping file repository, are automatically parsed to find a path linking the
input and output ontological terms. If such a path is found, it is used to create a SPARQL query. The query is
integrated into a semantic Web Service that is then registered in a Web Service registry, such as BioCatalogue.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Stein</surname>
            <given-names>LD</given-names>
          </string-name>
          :
          <article-title>Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges</article-title>
          .
          <source>Nat Rev Genet</source>
          <year>2008</year>
          ,
          <volume>9</volume>
          :
          <fpage>678</fpage>
          -
          <lpage>688</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Goble</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            <given-names>R</given-names>
          </string-name>
          :
          <article-title>State of the nation in data integration for bioinformatics</article-title>
          .
          <source>J Biomed Inform</source>
          <year>2008</year>
          ,
          <volume>41</volume>
          :
          <fpage>687</fpage>
          -
          <lpage>693</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gessler</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiltz</surname>
            <given-names>G</given-names>
          </string-name>
          , al.:
          <article-title>SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2009</year>
          ,
          <volume>10</volume>
          :
          <fpage>309</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wilkinson</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            <given-names>L</given-names>
          </string-name>
          , al.:
          <article-title>SADI, SHARE, and the in silico scientific method</article-title>
          .
          <source>BMC Bioinformatics</source>
          <year>2010</year>
          , 11:
          <fpage>S7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. The BioMoby Consortium:
          <article-title>Interoperability with Moby 1.0-It's better than sharing your toothbrush! Briefings in Bioinformatics 2008</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Spanos</surname>
            <given-names>D-E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stavrou</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mitrou</surname>
            <given-names>N</given-names>
          </string-name>
          :
          <article-title>Bringing Relational Databases into the Semantic Web: A Survey</article-title>
          .
          <source>IOS Press</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Erling</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikhailov</surname>
            <given-names>I</given-names>
          </string-name>
          :
          <article-title>Mapping Relational Data to RDF in Virtuoso</article-title>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Miles</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>White-Cooper</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shotton</surname>
            <given-names>D</given-names>
          </string-name>
          :
          <article-title>OpenFlyData: An exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cheung</surname>
            <given-names>K-H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yip</surname>
            <given-names>KY</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>deKnikker</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masiar</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerstein</surname>
            <given-names>M</given-names>
          </string-name>
          :
          <article-title>YeastHub: a semantic web use case for integrating data in the life sciences domain</article-title>
          .
          <source>Bioinformatics</source>
          <year>2005</year>
          ,
          <volume>21</volume>
          :
          <fpage>i85</fpage>
          -
          <lpage>i96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lam</surname>
            <given-names>HYK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marenco</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shepherd</surname>
            <given-names>GM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            <given-names>PL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheung</surname>
            <given-names>K-H</given-names>
          </string-name>
          :
          <article-title>Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences</article-title>
          .
          <source>AMIA Annu Symp Proc</source>
          <year>2006</year>
          ,
          <year>2006</year>
          :
          <fpage>464</fpage>
          -
          <lpage>468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rahayu</surname>
            <given-names>JW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            <given-names>E</given-names>
          </string-name>
          , al.:
          <article-title>A methodology for transforming inheritance relationships in an object-oriented conceptual model to relational tables</article-title>
          .
          <source>Information and Software Technology</source>
          <year>2000</year>
          ,
          <volume>42</volume>
          :
          <fpage>571</fpage>
          -
          <lpage>592</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tirmizi</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeda</surname>
            <given-names>J</given-names>
          </string-name>
          , al.:
          <article-title>Translating SQL Applications to the Semantic Web</article-title>
          .
          <source>In Database and Expert Systems Applications</source>
          .
          <year>2008</year>
          :
          <fpage>450</fpage>
          -
          <lpage>464</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>