<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extending SMW+ with a Linked Data Integration Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Becker</string-name>
          <email>chris@beckr.org</email>
          <email>chris@bizer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bizer</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Erdmann</string-name>
          <email>erdmann@ontoprise.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mark Greaves</string-name>
          <email>markg@vulcan.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MediaEvent Services GmbH &amp; Co. KG</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vulcan Inc.</institution>
          ,
          <addr-line>Seattle WA</addr-line>
          ,
          <country country="US">US</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Web-based Systems Group, Freie Universitat Berlin</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>ontoprise GmbH</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present a project which extends a SMW+ semantic wiki with a Linked Data Integration Framework that performs Web data access, vocabulary mapping, identity resolution, and quality evaluation of Linked Data. As a result, a large collection of neurogenomicsrelevant data from the Web can be exibly transformed into a uni ed ontology, allowing uni ed querying, navigation, and visualization; as well as support for wiki-style collaboration, crowdsourcing, and commentary on chosen data sets.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data</kwd>
        <kwd>Data Integration</kwd>
        <kwd>Semantic MediaWiki</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        of the project will bring ABA, Uniprot7, KEGG Pathway8, PharmGKB9 and
Linking Open Drug Data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] data sets together in order to solve the challenge
of nding drugs that target elements within a disease pathway, but are not yet
used to treat the disease. The genes associated with the found drugs may then be
compared for commonalities through an integrated analysis of the related ABA
structure expression data.
2
      </p>
      <p> </p>
      <p>In tegrating Linked Data from the Web
In this c hapter, we discuss the architecture of the SMW-LDE deployment, which
is depic ted in gure 1.</p>
      <sec id="sec-1-1">
        <title>Application  Layer  </title>
        <p>SMW+  Semantic  Enterprise  Wiki    </p>
      </sec>
      <sec id="sec-1-2">
        <title>Data  Access,    </title>
      </sec>
      <sec id="sec-1-3">
        <title>Integration  and    </title>
      </sec>
      <sec id="sec-1-4">
        <title>Storage  Layer  </title>
      </sec>
      <sec id="sec-1-5">
        <title>Web  of  Linked  Data  </title>
      </sec>
      <sec id="sec-1-6">
        <title>Publication  Layer  </title>
        <p>   
LDimporter,  
LDspider  </p>
        <p>Web  Data  
Access  Module  </p>
      </sec>
      <sec id="sec-1-7">
        <title>HTTP  </title>
      </sec>
      <sec id="sec-1-8">
        <title>SPARQL  </title>
        <p>SMW+  Linked  Data  Integration  Framework  
R2R  
Vocabulary  
Mapping  
Module  </p>
        <p>Silk  Server  
Identity  
Resolution  
Module  </p>
        <p>TPEE  
Quality  
Evaluation  
Module  </p>
        <p>Integrated  </p>
      </sec>
      <sec id="sec-1-9">
        <title>Web  Data  </title>
      </sec>
      <sec id="sec-1-10">
        <title>HTTP  </title>
      </sec>
      <sec id="sec-1-11">
        <title>HTTP  </title>
      </sec>
      <sec id="sec-1-12">
        <title>HTTP  </title>
        <p>RDF/  
XML  
ABA  
RDF/  
XML  </p>
      </sec>
      <sec id="sec-1-13">
        <title>Uniprot  </title>
        <p>D2R  Server  
D2R  Server  
KEGG  </p>
      </sec>
      <sec id="sec-1-14">
        <title>Pathway  </title>
      </sec>
      <sec id="sec-1-15">
        <title>PharmGKB  </title>
        <p>All base data sets for the project, including Brain Atlas data, are published
on the Web according to the Linked Data principles, thereby becoming part of
a giant global graph { the Web of Linked Data. This logical graph is depicted
in the Web of Linked Data layer in gure 1. Non-RDF data sources such as
KEGG Pathway and PharmGKB are published using D2R Server10. The SMW+
Linked Data Integration Framework takes the role of a data access, integration
and storage layer that makes Web data available to the application (in this case,
7 http://www.uniprot.org/
8 http://www.genome.jp/kegg/pathway.html
9 http://www.pharmgkb.org/
10 http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/</p>
        <p>
          Extending SMW+ with a Linked Data Integration Framework
the semantic wiki) in a uni ed manner. It consists of an integrated chain of Web
data access, integration, and storage modules implemented in Java and Scala
using the Jena framework. The modules and their tasks are described in the
following.
1. Web Data Access Module The basic means to access Linked Data on the
Web is to dereference HTTP URIs and to discover additional data sources by
traversing RDF links, are realized using the LDspider11 Linked Data crawler.
In addition, our LDimporter module can also import data from various RDF
dump formats or SPARQL endpoints using data set descriptions based on
voiD descriptions [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and Semantic Sitemaps [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
2. Vocabulary Mapping Module Di erent Linked Data sources often use
di erent vocabularies to represent the same type of information. For
instance, both ABA and Uniprot ontologies contain the concept of a gene, but
associate di erent relations and properties with it. In order to make Web
data understandable to wiki users, the R2R Framework12 is employed as
the vocabulary mapping module in order to translate terms from di erent
vocabularies into the wiki ontology.
3. Identity Resolution Module Di erent Linked Data sources use di erent
URIs to identify the same entity, such as genes or proteins. The framework
integrates Silk Server13 as its identity resolution module, which interlinks
newly discovered data about entities with data about them that is already
de ned in the wiki or other connected data sources.
4. Quality Evaluation Module In order to prefer data from sources known
for good quality and to resolve data con icts [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the framework includes a
data quality evaluation module, the Trust Policy Evaluation Engine (TPEE).
5. Integrated Web Data The repository nally stores the Web data together
with provenance information to be used by the application layer. We employ
the Named Graphs data model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for representing Web data together with
provenance information as an integrated model.
3
        </p>
        <p>Using Linked Data in the Wiki
Making the integrated neurogenomics-relevant data available in a semantic wiki
empowers users to access it in various ways:
{ An ontology browser (cf. gure 2) of SMW+ enables the exploration of the
integrated data, including representation of provenance information. In this
way, correlations between data from di erent sources, as well as data con icts
can be identi ed in an interactive manner.
{ Inline queries (cf. gure 2) and the interactive query interface o er means
for ad hoc queries against the set of integrated Linked Data sources. The
query results can be organized in wiki pages and complemented with textual
descriptions or interpretations.
11 http://code.google.com/p/ldspider/
12 http://www4.wiwiss.fu-berlin.de/bizer/r2r/
13 http://www4.wiwiss.fu-berlin.de/bizer/silk/server/</p>
        <p>The most prominent bene t of using a semantic wiki environment in this
project is the collaborative creation of semantic meta-data via annotations.
Rather than just querying integrated Web data, users of the wiki can make
use of the data by referring to imported genes or proteins in their own
articles. By doing so, further statements about Web data resources are created that
are in turn exported as Linked Data. This includes cross data source
connections { for instance, the Allen Institute can publish their own data in the wiki
and cross-reference it to Uniprot proteins.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Acknowledgement</title>
      <p>This work was supported in part by Vulcan Inc. as part of its Project Halo
(www.projecthalo.com).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausenblas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Describing linked datasets</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Linked Data on the Web (LDOW2009)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bleiholder</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Data fusion</article-title>
          .
          <source>ACM Computing Surveys</source>
          ,
          <volume>41</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>41</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stickler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Named graphs</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <volume>247</volume>
          {
          <fpage>267</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delbru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stenzhorn</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tummarello</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Semantic sitemaps: E cient and exible access to datasets on the semantic web</article-title>
          .
          <source>In Proceedings of the 5th European Semantic Web Conference (ESWC2008)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jentzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andersson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Enabling tailored therapeutics with linked data</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Linked Data on the Web (LDOW2009)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Vrandecic</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          , Volkel,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Haller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Studer</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <article-title>Semantic wikipedia</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>5</volume>
          :
          <fpage>251</fpage>
          {
          <fpage>261</fpage>
          ,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>