<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Integration Framework of Pharmacology Databases Using Ontology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Phimphan Thipphayasaeng</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Poonpong Boonbrahm</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marut Buranarach</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anunchai Assawamakin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Pharmacy, Mahidol University</institution>
          ,
          <addr-line>Bangkok</addr-line>
          ,
          <country country="TH">Thailand</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Electronics and Computer Technology Center (NECTEC)</institution>
          ,
          <addr-line>Pathumthani</addr-line>
          ,
          <country country="TH">Thailand</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Informatics, Walailak University</institution>
          ,
          <addr-line>NakornsiThammarat</addr-line>
          ,
          <country country="TH">Thailand</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents linked data of pharmacology domain generated with ontology as central schema. To link data from several formats, data are transformed into database format, and they are mapped to ontology. The ontology is developed with concepts provided in the dataset. Mainly, the developed ontology contains a concept of drug, disease, genetic and drug-gene interaction with their details. The ontology is used as central schema to link concepts together; thus, linked data are created. Within the linked data, we found three types of links, i.e. addition of instances, addition of attributes and changing of variable data field to a link to another table. In this paper, actual scenarios of the found links with exemplified data are explained.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked data</kwd>
        <kwd>Ontology</kwd>
        <kwd>Pharmacological data</kwd>
        <kwd>Data integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In pharmacology field, data of drugs, their usage, study, and description have been
digitalized and provided on many sites. Those data give different characteristics of drugs;
hence, a schema of the databases was designed specifically for their purpose.
Additionally, these data are live data that have regularly been updated for experts to reference.
These data sources are open to use and are important for experts in the field to consult
for a case result and lengthen their researches.</p>
      <p>These databases apparently contain a large amount of data, and their schema is
complex. Users are needed to understand database schema and have pharmacological
background to read through the data. In fact, provided data have been gathered based on a
ground from where they are designed. Every database has its own strength and specified
to their locality. In usage, experts commonly require searching through many databases
to assure correctness and coverage of data, and they need to be aware of different
appearance terms referring to a same concept or instance (synonymy) or a same term with
many definitions (polysemy).</p>
      <p>
        To support users of the data, this work aims to link those open pharmaco-genetic
data together using Resource Description Framework (RDF) standard. RDF standard
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is often used as the metadata interchange format since its expression was designed
to represent as a model of information using a variety of syntax notations and data
serialization formats. This work proposes a method for interoperability between the
datasets from different formats and schema using Linked Data framework. Moreover, a
method for integration of data is designed to recognize an overlapping of data and
extend range of data with other data sources. A complete data integration solution should
provide data influent to be trusted by within crosschecking from a variety of sources
since a volume of data will be increased and coverage of scope will be extended.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>Pharmacology is the study of drug action and effects of the drug on biological
systems. Nowadays, available linked data were created and distributed as accessible data
for experts in the field. In this work, we review some of the well-known linked data in
pharmacology and summarized them as follows.</p>
      <p>
        ChEMBL[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides chemical entities of biological activities against drug targets.
That could be used as a reference for drug researchers. DrugBank [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] gathers
information on drugs and their targets that include in drug target discovery, drug design,
drug screening or docking, interaction prediction, metabolism prediction and
pharmaceutical education. Diseasome [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a resource for biomedical researchers that
include disease-gene associations using network maps and understanding of the genetic
origins of disease. DisGeNET [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] representation knowledge in the molecular
mechanisms that combines detailed gene data with disease and calculate a score in order to
rank of these associations for support research in the biomedical science. The Linking
Open Drug Data project focuses on linking various sources of drug data to answer
scientific questions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>This work aims to combine provided pharmacological data from several sources into
linked information. Ontology is chosen as an intermediate schema to relate data into
uniform concepts. The method involves with four processes as summarized in Fig. 1.
Fig. 1. An overview of Data Integration of Pharmacology Databases Using Ontology
3.1</p>
      <sec id="sec-3-1">
        <title>Data preparation</title>
        <p>
          In this paper, five pharmacological datasets, i.e. KEGG Disease [
          <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
          ], ThaiSNP
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Comparative Toxicogenomics Database [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], Drugbank [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and MeSH [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], are
chosen as input data for integration. The details of the dataset are given in Table 1.
        </p>
        <p>The commonness of data in these dataset in Table 1 is about drug, gene, and disease.
All datasets represent in several data formats such as database and XML. Please be
noted that these datasets contain some data tags involving in data management and
referring, such as sorting key and ID for their related application, in which we will ignore
in data integration since they are not semantically important. To combine these data
together, we need to uniform data format to database format. These databases will be
mapped to ontology in later process.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Ontology design</title>
        <p>
          Since the core of the aforementioned data is about drug, we decide to initiate with drug
concept as a main class and expand relations from this concept. Our ontology was
designed on and created by Hozo ontology editor [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] following the development
guideline by Mizucuchi [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>Firstly, terms in these datasets were gathered from the table heads and fields. With
the gathered terms, concepts of terms were recognized and relations to link concepts
were decided. Relations of concepts were decided based on following criteria:
 is-a relation : forming superclass-subclass relation in which the concepts must
be the same kind, and all properties of superclass inherit to its subclass
 object property : forming belonging relation of two concepts in which
representing of a part in another concept


data property : forming concept-data type relation to signify a concept
containing a value such as number or string
instance of : providing a relation to link a concept to real data or instance, this
relation is to link ontology class to real data in database</p>
        <p>
          Regarding to data in dataset, our ontology is designed to gather all the concepts. The
major concepts are Drug, Gene, Disease, interaction and SNP, and their properties are
the fields of their dataset. Some parts of the ontology are demonstrated in Fig. 2.
With ontology as a schema to gather and relate concepts from the datasets, the ontology
can be mapped to all data in the datasets. An ontology mapping from Ontology
Application Management (OAM) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] tool is chosen to help us in mapping. The mapping of
ontology class and dataset field was exemplified in Table 2.
To combine these data, we have to focus on commonness of data and concepts. In this
work, we found three types of integration of different dataset as 1. Same head concept,
different data, 2. Same head concept, different properties and 3. One of the concept is
a property of another.
        </p>
        <p>For the first integration, this results in gaining more instances. This helps in
expanding a variety of data. It should benefit users in giving more data to look through. The
second integration is about linking more data attributes for the same data. This widens
more aspects of data and includes more relevant information to the concepts. This type
gives users for more broad view of data properties. Last, the integration of different
tables. This is to give in-depth details of the information since the information will be
added with information of whole table. This is a change of single value or text of one
attribute to more details. The three types of integration are summarized in Table 3.
In this section, we demonstrate a case scenario of the data linking types to exemplify
actual cases with the dataset chosen for integration in this work.</p>
        <p>The scenario is an integration of Drugbank and CTD. In these datasets, they have
common data about a drug. Both of these datasets are about drug. For Drugbank, given
data are relevant to drug chemical compound and affected organ. Pharmaceutical action
of drug, however, is mentioned in CTD. These attributes do not exist in another set, and
it is recognized as the second type mentioned in Table 3. With ontology mapped to
these data, we realize that the table is the same concept since they are both mapped to
the same ontology class although table labels are different. After integration, we
obtained a combination of more properties to widen more aspects of the same data.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>This paper presents a data integration of pharmacology data from several sources using
ontology as a central schema. Five datasets are gathered in which provides data about
drug, disease, gene and interaction. To integrate data, cleaning process and uniform of
data format are initiated. Ontology is created with concepts given in datasets and is
mapped to the data via OAM framework. With ontology mapped to data, data from
those five sources are linked with semantic. In this work, the linked data contain three
types of links that are addition of instances, addition of attributes and changing of
variable data field to a link to another table. In the future, we plan to include more relevant
dataset to link more pharmacology data. An automatic method to map data to ontology
class will be researched to reduce human burden and time consuming in mapping
process. Lastly, we will apply semantic search with the obtained linked data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bruijn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Welty</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RIF</surname>
            <given-names>RDF</given-names>
          </string-name>
          and OWL Compatibility: available online at https://www.w3.org/TR/2013/REC-rif
          <string-name>
            <surname>-</surname>
          </string-name>
          rdf-owl-
          <volume>20130205</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Willighagen</surname>
            <given-names>EL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waagmeester</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spjuth</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ansell</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            <given-names>AJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tkachenko</surname>
            <given-names>V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hastings</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>DJ Wild DJ</surname>
          </string-name>
          .:
          <article-title>The ChEMBL database as linked open data</article-title>
          .
          <source>Journal of cheminformatics. 5</source>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wishart</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanali</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stothard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woolsey</surname>
            <given-names>J.:</given-names>
          </string-name>
          <article-title>DrugBank: a comprehensive resource for in silico drug discovery and exploration</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>34</volume>
          ,
          <fpage>D668</fpage>
          -
          <lpage>D672</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Wysocki</surname>
            <given-names>K</given-names>
          </string-name>
          , Ritter L.:
          <article-title>Diseasome: an approach to understanding gene-disease interactions</article-title>
          .
          <source>Annu Rev Nurs Res</source>
          .
          <volume>29</volume>
          ,
          <fpage>55</fpage>
          -
          <lpage>72</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Piñero</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Queralt-Rosinach</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bravo</surname>
            <given-names>À</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deu-Pons</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer-Mehren</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baron</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanz</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furlong</surname>
            <given-names>LI</given-names>
          </string-name>
          .
          <article-title>: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes</article-title>
          .
          <source>The journal of biological Database and Curation</source>
          .
          <year>2015</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Samwald</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jentzsch</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bouton</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kallesøe</surname>
            <given-names>CS.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willighagen</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajagos</surname>
            <given-names>J.</given-names>
          </string-name>
          , Marshall MS.,
          <string-name>
            <surname>Prud'</surname>
          </string-name>
          hommeaux E.,
          <string-name>
            <surname>Hassenzadeh</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pichler</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephens</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Linked open drug data for pharmaceutical research and development</article-title>
          .
          <source>J Cheminform</source>
          .
          <volume>3</volume>
          , (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kanehisa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furumichi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanabe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hirakawa</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>KEGG for representation and analysis of molecular networks involving diseases and drugs</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>38</volume>
          ,
          <fpage>D355</fpage>
          -
          <lpage>D360</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Kanehisa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sato</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawashima</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furumichi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tanabe</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>KEGG as a reference resource for gene and protein annotation</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>44</volume>
          ,
          <fpage>D457</fpage>
          -
          <lpage>D462</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kanehisa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goto</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>KEGG: Kyoto Encyclopedia of Genes and Genomes</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>28</volume>
          ,
          <fpage>27</fpage>
          -
          <lpage>30</lpage>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hattirat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngamphiw</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assawamakin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tongsima</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Catalog of Genetic Variations (SNPs and CNVs) and Analysis Tools for Thai Genetic Studies</article-title>
          .
          <source>Computational Systems-Biology and Bioinformatics</source>
          .
          <volume>115</volume>
          ,
          <fpage>130</fpage>
          -
          <lpage>140</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grondin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lennon-Hopkins</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saraceni-Richards</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sciaky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegers</surname>
            ,
            <given-names>T</given-names>
          </string-name>
          , Mattingly,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>The Comparative Toxicogenomics Database's 10th year anniversary: Update 2015</article-title>
          .
          <source>Nucleic Acids Res</source>
          .
          <volume>43</volume>
          ,
          <fpage>D914</fpage>
          -
          <lpage>D920</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Rogers</surname>
          </string-name>
          , Frank B.: “Communications to the Editor.
          <source>” Bulletin of the Medical Library Association</source>
          .
          <volume>70</volume>
          ,
          <fpage>5131</fpage>
          -
          <lpage>5136</lpage>
          (
          <year>1963</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>[13] Hozo Ontology Editor: available online at http://www.hozo.jp</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Mizuguchi</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Tutorial on ontological engineering - part 1: Introduction to ontological engineering</article-title>
          . In: New Generation Computing.
          <volume>21</volume>
          ,
          <fpage>365</fpage>
          -
          <lpage>384</lpage>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Buranarach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thein</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Supnithi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A Community-Driven Approach to Development of an Ontology-Based Application Management Framework</article-title>
          .
          <source>Semantic Technology</source>
          .
          <volume>7774</volume>
          ,
          <fpage>306</fpage>
          -
          <lpage>312</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>