<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LinkedCT Live: Platform for Online Curation of Clinical Trials Data?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oktie Hassanzadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renee J. Miller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatemeh Nargesian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erkang Zhu</string-name>
          <email>ekzhug@cs.toronto.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Toronto</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The goal of the Linked Clinical Trials (LinkedCT) project is to transform the data published on ClinicalTrials.gov into a highquality knowledge base published as Linked Data on the Web. In this demonstration, we present the platform we have developed for both online curation of clinical trials data into linked data, and for rapid Web application development on top of this linked data. We also show a few sample applications built using this platform. We have made the project open-source and invite researchers and healthcare professionals to develop applications that will be hosted on LinkedCT.org.</p>
      </abstract>
      <kwd-group>
        <kwd>Clinical Trials</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Semantic Healthcare Applications</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ClinicalTrials.gov is a large and widely used registry of international clinical
studies, which is maintained by the U.S. National Institutes of Health (NIH). The
registry contains detailed information about the studies including recruitment
information, eligibility criteria, and clinical outcomes. This data has tremendous
value to the clinical and healthcare research communities [
        <xref ref-type="bibr" rid="ref11 ref12 ref8 ref9">8,9,11,12</xref>
        ]. In 2008, we
began the Linked Clinical Trials (LinkedCT) project to study how the value of
this data could be enhanced by publishing the ClinicalTrials.gov data as
highquality (5-star [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) Linked Data. Initially, the data transformation from XML to
RDF was a static, manually-designed process that contained errors and resulted
in quality issues and incomplete data. As a result, we later redesigned the project
with the goal of having an automatic data curation process with online
transformation of XML data into Linked Data. In what follows, we brie y describe this
platform. We then describe our demonstration plan which includes showcasing
a number of lightweight applications built on top of LinkedCT data and hosted
on LinkedCT.org.
? This demonstration complements our full paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and was supported in part by
NSERC BIN.
      </p>
      <p>xCurator
Mapping API</p>
      <p>Crawler
Models
Layer
Django’s
ORM
RDBMS</p>
      <p>Web Server
Mapping Engine</p>
      <p>D2RQ</p>
      <p>Views &amp;
Template Layer</p>
      <p>RDF &amp;
SPARQL
Requests
D2R Server</p>
      <p>ClinicalTrials.gov</p>
      <p>XML
Data Browse
Interface on
LinkedCT.org
Entity Browse
Entity Search</p>
      <p>Simple SPARQL
Keyword Search (future)</p>
      <p>GeoSearch (future)
The architecture of LinkedCT is depicted in Figure 1. Our platform provides
end-to-end data curation that consumes XML data and produces current
(upto-data) Linked RDF data. This linked data can be accessed directly through
links on the website or SPARQL queries, or from a number of applications that
we will demonstrate.</p>
      <p>
        Creating a High-Quality Knowledge Graph The creation of a knowledge
graph is not simply a matter of translating XML (with its current structure)
into RDF. Rather, an important part of the curation process is the creation of
a structure (schema) that both accurately models the semantics of the original
data and also adds value to the data by facilitating the linking of the data to
external sources. The use of an automated translation tool (e.g., XSPARQL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ])
or a manual translation process such as the one originally employed in LinkedCT
can result in data quality issues. In LinkedCT, we discovered many of these issues
by analyzing user reports and by using LinkedCT in several LODD projects [
        <xref ref-type="bibr" rid="ref5 ref6">5,6</xref>
        ].
{ Our users reported missing entity types (RDF classes) or missing attributes
(RDF properties). Users may have had knowledge of classes and
properties in related NIH sources or other health data sources that they expected
to see represented in the same way in LinkedCT. A manual or automated
translation may not always create the classes and properties users expect.
{ An application or use-case for LinkedCT required that a literal property
(such as the string-valued property location countries) be represented as
an entity type (the RDF class Country). This requirement may come from
a desire to link data to external sources (using the URIs of the entities).
{ There may be inconsistencies in the XML such as the same semantic
information being represented by di erent XML labels (and as a result di erent
entity types in an RDF graph derived directly from XML elements). Such
inconsistencies should be reconciled to create a well-designed knowledge graph.
      </p>
      <p>
        To address these issues, we needed to go beyond automated translation to
create a schema and mapping discovery system. xCurator [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is an end-to-end
system for transforming semi-structured data into a well-designed linked
knowledge base. LinkedCT was the main evaluation platform for xCurator. In this
demonstration, we will use a new light-weight web application that uses the
output of the mapping generator component of xCurator and can run on modest
hardware or virtual machines. This component uses the Crawler module
(Figure 1) that monitors ClinicalTrials.gov for new trials and for updates to existing
trials. In the Model Generator, the mappings created by the xCurator mapping
generator are translated into an intermediate Object-Relational Mapper (ORM)
model de nition (implemented in the Django framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). We are then able
to use a reliable relational backend to store our data while using D2R Server [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
to generate RDF. Moreover, the Django framework facilitates rapid application
development and provides an infrastructure to host applications in a common
platform and where the data resides.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Demonstration Plan</title>
      <p>Our goal in this demonstration is three-fold. First, we will demonstrate the online
transformation of XML data into high-quality Linked Data. To this end, we will
be using a secondary server with an empty database and start submitting XML
data to the server to show the transformation process. We will then make changes
to an already processed XML le and show how the system manages di erent
changes in the input data. Our goal is to illustrate the importance of having a
dynamic (or online) mapping discovery process that can adapt to changes in the
data and its structure.</p>
      <p>Second, an important goal of LinkedCT is to study how the value of open data
can be enhanced by publishing it as curated Linked Data. In the demonstration,
we will show examples from the ClinicalTrials.gov source and its keyword search
interface. We will contrast these with results from a search over LinkedCT to
show how high-quality Linked Data can enable and enhance various exploration
and analysis tasks.</p>
      <p>Our nal goal is to show how new applications can be developed on top of
our platform. We will use our simple \Live Statistics" page (http://linkedct.
org/stats/) as a sample \Getting Started" application and show how it can be
written and deployed in only a few minutes. We will also show more advanced
applications that are currently under development by our team: 1) a Geo-Search
interface that shows clinical studies involving a given condition or medication in
the proximity of a given location, and 2) a fuzzy look-ahead semantic keyword
search interface powered by SRCH2 (http://srch2.com).</p>
      <p>We have made the platform open-source and an important goal of the
demonstration is to encourage the community to implement new applications that can
be hosted on LinkedCT.org to further add value to the Clinical Trials data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Django Web Framework, https://www.djangoproject.com/. [Online; accessed 07- 07-2015].</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee. Linked Data - Design Issues</surname>
          </string-name>
          . http://www.w3.org/ DesignIssues/LinkedData.html,
          <year>2006</year>
          . [Online; accessed 07-07-2015].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak. D2R Server - Publishing Relational</surname>
          </string-name>
          <article-title>Databases on the Semantic Web</article-title>
          .
          <source>In Proc. of the Int'l Semantic Web Conference (ISWC)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Automatic Curation of Clinical Trials Data in LinkedCT</article-title>
          .
          <source>In Proc. of the Int'l Semantic Web Conference (ISWC)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Andersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stephens</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Enabling Tailored Therapeutics with Linked Data</article-title>
          .
          <source>In Proceedings of the WWW2009 workshop on Linked Data on the Web (LDOW2009)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Anja</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kei-Hoi</surname>
            <given-names>Cheung</given-names>
          </string-name>
          , Matthias Samwal, and
          <string-name>
            <given-names>Bosse</given-names>
            <surname>Andersson</surname>
          </string-name>
          .
          <article-title>Linking Open Drug Data (Tripli cation Challenge Report)</article-title>
          .
          <source>In Proceedings of the International Conference on Semantic Systems (ISEMANTICS'09)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Krennwallner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kopecky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker. XSPARQL Language</surname>
          </string-name>
          <article-title>Speci cation</article-title>
          . http://www.w3.org/Submission/ xsparql-language-specification/. [Online; accessed 07-07-2015].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Prayle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Hurley</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Smyth</surname>
          </string-name>
          .
          <article-title>Compliance with Mandatory Reporting of Clinical Trial Results on ClinicalTrials.gov: Cross Sectional Study</article-title>
          .
          <source>BMJ</source>
          ,
          <volume>344</volume>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zarin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Krumholz</surname>
          </string-name>
          .
          <article-title>Publication of NIH Funded Trials Registered in ClinicalTrials.gov: Cross Sectional Analysis</article-title>
          .
          <source>BMJ</source>
          ,
          <volume>344</volume>
          ,
          <year>January 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hassas Yeganeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Linking Semistructured Data on the Web</article-title>
          .
          <source>In Proc. of the Int'l Workshop on the Web and Databases (WebDB)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zarin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tse</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Ide</surname>
          </string-name>
          . Trial Registration at ClinicalTrials.
          <source>gov between May and October 2005. New England Journal of Medicine</source>
          ,
          <volume>353</volume>
          (
          <issue>26</issue>
          ):
          <volume>2779</volume>
          {
          <fpage>2787</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Zarin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Cali</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Ide</surname>
          </string-name>
          .
          <source>The ClinicalTrials.gov Results Database{Update and Key Issues. New England Journal of Medicine</source>
          ,
          <volume>364</volume>
          (
          <issue>9</issue>
          ):
          <volume>852</volume>
          {
          <fpage>860</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>