<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic rule processing for real-time data integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedro Lopes</string-name>
          <email>pedrolopes@ua.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José Luís Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DETI/IEETA, University of Aveiro</institution>
          ,
          <addr-line>Aveiro</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Integration-as-a-service platforms arise as a modern strategy to integrate data from distributed environments. Nowadays, service interoperability strategies, such as workflows and static service-oriented architectures, are giving place to more dynamic environments, where the path from the original resource to the integrative destination is triggered autonomously and in real-time. However, these concepts still have not been applied to bioinformatics, in great part due to the complexity underlying the data validation and transformation tasks. In this manuscript we introduce a component to enhance these activities by enabling the execution of complex pre- and post-integration semantic algorithms. By leveraging on comprehensive Semantic Web constructs, these activities are better suited to the life sciences domain.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>data integration</kwd>
        <kwd>real-time</kwd>
        <kwd>application framework</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Real-time data integration continues to be a challenge [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], particularly in the life
sciences domain [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Whereas in other scenarios, such as mechanical engineering or
embedded software, several solutions are in place, the automated integration of
biomedical data has plenty room for innovation.
      </p>
      <p>
        With the evolution of cloud-based technologies, integration-as-a-service ideals are
now being used to describe new platforms that enable real-time automated integration
of data and services [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        More importantly, the Semantic Web's underlying flexibility and dynamics can be
combined with integration-as-a-service strategies to reach a whole that is more than
the sum of its parts. The complex Extract-Transform-Load (ETL) data warehousing
workflow can be improved through the surgical inclusion of semantic-based
components [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        This work introduces such component, enhancing the integration workflow with
new methods to pre- and post-process data in the integration pipeline [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. Data can
be validated or transformed, according to a predefined set of customizable rules.
Simpler validation examples include regular expression to match text content or Boolean
arithmetic expressions to evaluate numeric values. At a more complex level, inference
and reasoning can be used to transform content before integration.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>The semantic rule-processing component will interact directly with the ETL engine
in the integration pipeline - Figure 1. The basic integration workflow is comprised by
three tasks, moving the data from the original source to the desired destination: (1)
data extraction, (2) data transformation, and (3) data loading. Our rule processing
algorithms will divide the second step, data transformation, in two complementary
activities: validation and transformation.</p>
      <p>Extract-Transform-Load</p>
      <p>Rule Processing
Origin Resource
Destination Resource</p>
      <p>
        Applying these semantic rules is an interactive workflow, requiring the
communication between the main application engine, the rule processing engine, and a
knowledge base containing the rule configuration and translated content. This
iterative process is described next and highlighted in Figure 2, using COEUS [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] as the
system knowledge base.
1. Get content: load the content for processing from COEUS;
2. Return content: the application engine receives the content graph for semantic rule
processing;
3. Get rules: the application engine requests the semantic processing rules from
      </p>
      <p>COEUS;
4. Return rules: COEUS returns the matching rules, if existing;
5. Process content: Apply matched validation and transformation rules to content,
sending it to the final destination for integration;
6. Log: log all performed activities in the system.
1: Get content
2: Return content
3: Get rules
4: Return rules
[No Rules]
5.a: Forward content
[Has Rules]
5.b.1: Process content</p>
      <p>5.b.2: Return processed content
5.b.3: Forward content
6: Log</p>
      <p>Leveraging on Semantic Web's capabilities, this modular engine can perform
complex processing algorithms. These are divided in two categories: validation and
transformation.</p>
      <p>As the name implies, validation rules evaluate data against predefined conditions.
Validation algorithms output a Boolean value: if true (content is valid) the integration
proceeds, if false (content is not valid) the integration workflow stops.</p>
      <p>There are two types of validation rules, based on conditional statements and
regular expressions. Conditional validation rules assess if the content obeys to a
predefined condition. These mimic simple arithmetic comparisons: less than (&lt;), less or
equal than (&lt;=), more than (&gt;), more or equal than (&gt;=), equal (=), different (!=).
While these are suitable for numerical values, string-based content requires more
complex validation. For instance, emails, URLs, UniProt or reference sequence
identifiers require specific matches for validation. Hence, the inclusion of regular
expressions in the validation process is imperative.</p>
      <p>Transformation rules are used to generate new content from existing data. Like in
the validation rules, there are two types of possible transformations, basic operations
and semantic. Basic operations cover number and string manipulation tasks, including
mathematical equations for numbers and string concatenation or replacement for text.</p>
      <p>
        Semantic transformations rely on inference and reasoning to generate new
knowledge for integration, using complex ontology rules. These extremely powerful
features are of growing importance, especially in the life sciences context, as they
allow automating intelligent knowledge generation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>Despite the ever-growing number of frameworks in this domain, there are several
untapped challenges regarding the integration of information from distributed
resources. Within these, improving the execution of semantic processing and validation
rules arises as a key opportunity to greatly improve data integration platforms.</p>
      <p>In this manuscript we propose a modular engine that adds a layer of semantics to
traditional data integration workflows. With this strategy, the engine enables
executing multiple pre- and post-integration processing algorithms, including data validation
and transformation.</p>
      <p>Acknowledgments. The research leading to these results has received funding
from the European Community (FP7/2007-2013) under ref. no. 305444 – the
RD-Connect project, and from the QREN "MaisCentro" program, ref.
CENTRO-07-ST24-FEDER-00203 – the CloudThinking project</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bruckner</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>List</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiefer</surname>
          </string-name>
          , J.:
          <article-title>Striving towards near real-time data integration for data warehouses</article-title>
          . Springer (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mouttham</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peyton</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eze</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saddik</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          :
          <article-title>Event-Driven Data Integration for Personal Health Monitoring</article-title>
          .
          <source>Journal of Emerging Technologies in Web Intelligence;</source>
          Vol
          <volume>1</volume>
          , No 2 (
          <year>2009</year>
          ): Special Issue: E-health
          <string-name>
            <surname>Interoperability</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Naeem</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dobbie</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webber</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>An Event-Based Near Real-Time Data Integration Architecture</article-title>
          .
          <source>In: Enterprise Distributed Object Computing Conference Workshops</source>
          ,
          <year>2008</year>
          12th, pp.
          <fpage>401</fpage>
          -
          <lpage>404</lpage>
          . (
          <year>Year</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Teymourian</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paschke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantic Rule-Based Complex Event Processing</article-title>
          . In: Governatori,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Paschke</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Rule Interchange and Applications</source>
          , vol.
          <volume>5858</volume>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>92</lpage>
          . Springer Berlin Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Anicic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fodor</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stühmer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stojanovic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Studer</surname>
          </string-name>
          , R.:
          <article-title>A Rule-Based Language for Complex Event Processing and Reasoning</article-title>
          . In: Hitzler,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Lukasiewicz</surname>
          </string-name>
          , T. (eds.)
          <source>Web Reasoning and Rule Systems</source>
          , vol.
          <volume>6333</volume>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>57</lpage>
          . Springer Berlin Heidelberg (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Paschke</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlenkov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Rule-Based Event Processing and Reaction Rules</article-title>
          . In: Governatori,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Paschke</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Rule Interchange and Applications</source>
          , vol.
          <volume>5858</volume>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>66</lpage>
          . Springer Berlin Heidelberg (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lopes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>COEUS:“semantic web in a box” for biomedical applications</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>3</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vandervalk</surname>
            ,
            <given-names>B.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
          </string-name>
          , M.D.:
          <article-title>SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web</article-title>
          .
          <source>arXiv preprint arXiv:1305.4455</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>