=Paper= {{Paper |id=Vol-538/paper-10 |storemode=property |title=Enabling Tailored Therapeutics with Linked Data |pdfUrl=https://ceur-ws.org/Vol-538/ldow2009_paper9.pdf |volume=Vol-538 }} ==Enabling Tailored Therapeutics with Linked Data== https://ceur-ws.org/Vol-538/ldow2009_paper9.pdf
            Enabling Tailored Therapeutics with Linked Data
             Anja Jentzsch                                 Bo Andersson                            Oktie Hassanzadeh
       Freie Universität Berlin                        AstraZeneca R&D Lund                        University of Toronto
      Web-based Systems Group                           221 87 Lund, Sweden                          Database Group
             Garystr. 21                                                                  10 King’s College Rd, Toronto, Canada
       14195 Berlin, Germany
                                                        bo.h.andersson@
                                                        astrazeneca.com                          oktie@cs.toronto.edu
      mail@anjajentzsch.de
                                    Susie Stephens                              Christian Bizer
                                 Eli Lilly and Company                      Freie Universität Berlin
                                 Lilly Corporate Center                    Web-based Systems Group
                           Indianapolis, Indiana 46285, USA                       Garystr. 21
                                                                            14195 Berlin, Germany
                         Stephens_Susie_M@Lilly.com
                                                                                chris@bizer.de

ABSTRACT                                                              that are suitable for preventive and tailored treatment regimes [1,
Advances in the biological sciences are allowing pharmaceutical       2]. This shift requires a more systematic approach to integrating
companies to meet the health care crisis with drugs that are more     and interpreting information spanning genes, proteins, pathways,
suitable for preventive and tailored treatment, thereby holding the   targets, diseases, drugs, and patients [3]. The amount of publicly
promise of enabling more cost effective care with greater efficacy    available data that is relevant for drug discovery has grown
and reduced side effects. However, this shift in business model       significantly over recent years [4, 5], and has reached a point
increases the need for companies to integrate data across drug        where present tools are no longer effective. Scientists need new
discovery, drug development, and clinical practice. This is a         more efficient ways to interrogate data than simply jumping from
fundamental shift from the approach of limiting integration           one public data source to another. This is because there are too
activities to functional areas. The Linked Data approach holds        many disparate data sources for scientists to conceptualize there
much potential for enabling such connectivity between data silos,     relationships and remember that they all exist, let alone mastering
thereby enabling pharmaceutical companies to meet the urgent          the different user interfaces and inconsistent terminology. Further,
needs in society for more tailored health care. This paper            the prevalence of single query input fields makes it difficult for
examines the applicability and potential benefits of using Linked     scientists to retrieve precise information of interest, and to
Data to connect drug and clinical trials related data sources and     retrieve data that spans different data sources.
gives an overview of ongoing work within the W3C's Semantic           Linked Data has the potential to ease access to these data for
Web for Health Care and Life Sciences Interest Group on               scientists and managers by making the connections between the
publishing drug related data sets on the Web and interlinking         data sets explicit in the form of data links. This can be
them with existing Linked Data sources. A use case is provided        accomplished using RDF as a standardized data representation
that demonstrates the immediate benefit of this work in enabling      format, HTTP as a standardized access mechanism, and through
data to be browsed from disease, to clinical trials, drugs, targets   the development of algorithms for discovering the links between
and companies.                                                        data sets. Such explicit links allow scientists to navigate between
                                                                      data sets and discover connections they might not have been
Categories and Subject Descriptors                                    aware of previously. The standardized representation and access
H.3.5 [Online Information Services]: Data Sharing                     mechanisms allow generic tools, such as Semantic Web browsers
                                                                      and search engines, to be employed to access and process the
                                                                      data.
General Terms
Experimentation, Languages                                            The Linking Open Drug Data (LODD) task within the W3C's
                                                                      Semantic Web for Health Care and Life Sciences Interest Group1
                                                                      gathered a list of data sets that include information about drugs,
Keywords                                                              and then determined how the publicly available data sets could be
Linked Data, Semantic Web, Tailored Therapeutics, Drugs,              linked together. The review showed that this domain is promising
Clinical Trials, Competitive Intelligence                             for Linked Data as there are many publicly available data sets,
                                                                      and they frequently share identifiers for key entities. The
1. INTRODUCTION                                                       complete evaluation results are posted on the W3C ESW Wiki2.
The crisis in health care is changing the business model of
                                                                      Participants of the LODD task have undertaken to demonstrate
pharmaceutical companies to discovering and developing drugs
                                                                      the value of Linked Data to the health care and life sciences

 Copyright is held by the author/owner(s).                            1
                                                                          http://esw.w3.org/topic/HCLSIG/LODD
 LDOW2009, April 20, 2009, Madrid, Spain.                             2
                                                                          http://esw.w3.org/topic/HCLSIG/LODD/Data/DataSetEvaluation
domain. This has been achieved by publishing and linking several      of disorders, disease genes, and associations between them was
drug related data sets on the Web, and investigating use cases that   obtained from the Online Mendelian Inheritance in Man
demonstrate how researchers in life science, as well as physicians    (OMIM)7, a compilation of human disease genes and phenotypes.
and patients can take advantage of the connected data sets.           The data set is published by Diseasome in a flat file
                                                                      representation. The flat files were read into a relational database
This paper is structured as follows: Section 2 describes the
                                                                      and made accessible as Linked Data using D2R server. The
published data sets, their linkage with other published data
                                                                      Linked Data version of Diseasome contains 88,000 triples and
sources, and the methods that were used to create the links.
                                                                      23,000 links8.
Section 3 exemplifies how navigating linked data can be utilized
within a competitive intelligence use case. While Section 4           DailyMed9 is published by the National Library of Medicine, and
summarizes our findings and experiences from publishing and           provides high quality information about marketed drugs.
navigating the data sets.                                             DailyMed provides much information including general
                                                                      background on the chemical structure of the compound and its
                                                                      mechanism of action, details on the clinical pharmacology of the
2. LINKED DATA SETS
                                                                      compound, indication (disorder) and usage, contraindications,
In this project, data about pharmaceutical companies, drugs in
                                                                      warnings, precautions, adverse reactions, overdosage, and patient
clinical trials, mechanisms of action of drugs, safety information,
                                                                      counseling. The data was originally published in Structured
and data about disease gene correlations were added to the Linked
                                                                      Product Labeling 10 , a XML-based standard for exchanging
Data cloud. This selection of data sets enabled strong connections
                                                                      medication information that has been recently introduced by the
to existing Linked Data resources, while providing novel data of
                                                                      Food and Drug Administration in the United States. It was
interest to the pharmaceutical industry. The existing Linked Data
                                                                      published using the D2R server. The Linked Data version of
of primary interest to this work includes the many bioinformatics
                                                                      DailyMed contains 124,000 triples and 29,600 links11.
and cheminformatics data sources published by Bio2RDF [6], and
the information on diseases and marketed drugs in DBpedia [7].
The linkage of the newly published data sets to each other and
relevant existing Linked Data is shown in Figure 1.
The Linked Clinical Trials (LinkedCT) data source 3 is derived
from a service provided by U.S. National Institutes of Health,
ClinicalTrials.gov, a registry of more than 60,000 clinical trials
conducted in 158 countries. Each trial is associated with a brief
description, related disorders 4 and interventions, eligibility
criteria, sponsors, locations (investigators), and several other
pieces of information. The data on LinkedCT is obtained by first
transforming the XML data provided by ClinicalTrials.gov to
relational data using the capabilities of a hybrid relational-XML
Relational Database Management System such as IBM DB2. This
transformation requires identification of the entities and facts in
the XML data and storing them in reasonably normalized
relational tables that are appropriate for transformation into RDF.             Figure 1. This figure shows the incorporation of
The RDF data is then published using D2R server [8]. The RDF                LinkedCT, DailyMed, DrugBank, and Diseasome into the
version of the dataset contains 7,011,000 triples and 290,000                 Linked Data cloud. These data are represented in dark
links.                                                                      gray, while light gray represents other Linked Data from
                                                                            the life sciences, and white indicates interlinked datasets
DrugBank [9] is a large repository of almost 5000 FDA-approved              covering geographic, person-related and conceptual data.
small molecule and biotech drugs. It contains detailed information
about drugs including chemical, pharmacological and
pharmaceutical data; along with comprehensive drug target data        There are many commonly used identifiers in the life sciences
such as sequence, structure, and pathway information. The data        that can be utilized for making links between data sets explicit.
was originally published as DrugBank DrugCards 5 and was re-          Links that were generated based on shared identifiers include the
published as Linked Data using D2R server. The Linked Data            connections from LinkedCT to Bio2RDF's PubMed, and from
version of DrugBank contains 1,153,000 triples and 60,300 links6.     DrugBank to DBpedia. The connections between bioinformatics
                                                                      and cheminformatics data sources are already provided by
Diseasome [10] contains information about 4,300 disorders and         Bio2RDF allowing us to interlink our drug-related data sets to
disease genes linked by known disorder–gene associations for          their work. In cases where no shared identifiers exist, string and
exploring known phenotype and disease gene associations and           semantic matching techniques were applied for link discovery
indicating the common genetic origin of many diseases. The list

                                                                      7
3
    http://linkedct.org                                                   www.ncbi.nlm.nih.gov/omim
                                                                      8
4
    disorder is used as a synonym for disease and indication,             http://www4.wiwiss.fu-berlin.de/diseasome/
                                                                      9
    http://en.wikipedia.org/wiki/Disease#Disorder                         http://dailymed.nlm.nih.gov/
5                                                                     10
    http://www.drugbank.ca/fields                                          http://www.fda.gov/oc/datacouncil/SPL.html
6                                                                     11
    http://www4.wiwiss.fu-berlin.de/drugbank/                              http://www4.wiwiss.fu-berlin.de/dailymed/
[11]. Approximate string matching was employed to interlink             DrugBank (drug) →                   drugbank:cas
                                                                                                                               2,240
LinkedCT and Diseasome, where for instance "Alzheimer's                   Bio2RDF’s CAS                    RegistryNumber
disease" in LinkedCT was matched with "Alzheimer_disease" in            DrugBank (drug) →
                                                                                                           drugbank: hgncId    1,675
Diseasome. Semantic matching is especially useful in matching            Bio2RDF’s HGNC
clinical terms as many drugs and diseases have multiple names.          DrugBank (drug) →                   drugbank: kegg
                                                                                                                               1,331
Drugs tend to have generic names and brand names, for example,      Bio2RDF’s KEGG Compound                  CompoundId
"Varenicline" has the synonym "Varenicline Tartrate" and the            DrugBank (drug) →                   drugbank:kegg
brand names "Champix" and "Chantix".                                                                                            913
                                                                      Bio2RDF’s KEGG Drug                        Drug
                                                                        DrugBank (drug) →
Table 1. Numbers of outgoing data links from the published drug                                        drugbank: chebiId        736
                                                                         Bio2RDF’s ChEBI
                      related data sets.
                                                                        Diseasome (gene) →             diseasome:bio2rdf
                                                                                                                               9,743
  Data set                     Number of links                          Bio2RDF’s Symbol                    Symbol
                                290,000 links;                         Diseasome (disease) →
 LinkedCT                                                                                                  diseasome:omim      2,929
                   50,000 of them inside the LODD cloud                  Bio2RDF’s OMIM
                                 23,000 links;                          Diseasome (gene) →
 DrugBank                                                                                              diseasome:hgncId         688
                   8,500 of them inside the LODD cloud                   Bio2RDF’s HGNC
                                 29,600 links;                          Diseasome (gene) →
 DailyMed                                                                                                  diseasome:geneId     688
                     all of them inside the LODD cloud                  Bio2RDF’s GeneID
                                 23,000 links;
 Diseasome
                   8,400 of them inside the LODD cloud
                                                                    3. COMPETITIVE INTELLIGENCE CASE
                                                                    STUDY
Table 1 summarizes the number of links from our published data
                                                                    A use case has been developed that demonstrates the value of
sets to Linked Data within the LODD cloud and beyond. Table 2
                                                                    Linked Data about drugs to the pharmaceutical industry.
differentiates the number and type of links between data sources
                                                                    Departments within pharmaceutical companies have typically
and indicates their frequency. A double headed arrow in the first
                                                                    decided independently which data sets need to be brought into
column indicates that the links are bidirectional, while a single
                                                                    their organization for integration and interrogation. Access to the
headed arrow indicates unidirectional links.
                                                                    data is provided to employees based upon their roles. The use
 Table 2. Type and frequency of links between the LODD data         case describes the value that can be gained by allowing
           sets, and between LODD and Bio2RDF.                      employees to gain access to a more diverse and linked body of
                                                                    data. This approach enables new and novel questions to be
      Source / Target                Link Type          Count       explored. The following use case describes a scenario in
 LinkedCT (intervention) ↔                                          competitive intelligence.
                                     owl:sameAs         27,685
      DailyMed (drug)
                                                                    A neuroscience focused business manager is interested in seeing
 LinkedCT (intervention) ↔
                                     owl:sameAs         12,127      an update on new clinical trials that competitors are starting in
      DrugBank (drug)
                                                                    Alzheimer’s Disease (AD). These updates influence future sales
 LinkedCT (intervention) ↔
                                     rdfs:seeAlso       8,848       forecasts across geographies, and impact portfolio decisions as
      DBpedia (drug)
                                                                    new drugs needs to demonstrate improved safety and efficacy
  LinkedCT (condition) ↔
                                     owl:sameAs          444        compared to the existing pharmacopeia.
     DBpedia (disease)
  LinkedCT (condition) ↔                                            Using a Semantic Web browser of choice – for instance
                                     owl:sameAs          301
    Diseasome (disease)                                             Tabulator12 or the Marbles data browser13, the manager is able to
    LinkedCT (trial) →                                              see all drugs in trials for AD in LinkedCT, including a new phase
                                   foaf:based_near     129,177
         Geonames                                                   III trial planned by Pfizer for a drug called Varenicline. The
  LinkedCT (reference) →                                            business manager can see that more information is available about
                                     owl:sameAs         42,219
    Bio2RDF’s PubMed                                                the drug, which is unusual because not much data is typically
    LinkedCT (trial) →                                              available for drugs that are under investigation. Following the
                                      foaf:page         61,920
     ClinicalTrials.gov                                             data link the manager sees data from DailyMed that shows that
    DrugBank (drug) ↔             drugbank:possible                 the drug is already on the market for nicotine addiction.
                                                        8,201
       Diseasome (disease)          DiseaseTarget
                                                                    As side effects are better understood for drugs that are already on
    DrugBank (drug) ↔             drugbank:branded
                                                        1,593       the market, they tend to be more successful in trials. Out of
      DailyMed (drug)                   Drug
                                                                    curiosity, the manager scrolls down the page to see that side
    DrugBank (drug) ↔
                                     owl:sameAs         1,522       effects are listed as constipation, sleeping problems, vomiting,
      DBpedia (drug)
                                                                    nausea, and gas; and that the typical dose is 1mg twice daily. The
 DrugBank (drug target) →          drugbank: pfam
                                                        19,028      dose stated on LinkedCT for the trial was no higher than that, so it
     Bio2RDF’s PFAM               DomainFunction
                                                                    is unlikely that this drug will have new safety problems.
    DrugBank (drug) →             drugbank:enzyme
                                                        4,660
    Bio2RDF’s UniProt                SwissprotId
    DrugBank (drug) →
                                  drugbank:iupacId      4,592
     Bio2RDF’s IUPAC                                                12
                                                                         http://www.w3.org/2005/ajar/tab
 DrugBank (drug target) →
                                   drugbank:pdbId       3,379       13
                                                                         http://beckr.org/marbles
      Bio2RDF’s PDB
Given the promising safety profile, the manager is curious to         4. OUTLOOK
discover why a nicotine addiction drug might work for AD.             This paper describes the mapping of four drug related data
Linking to DrugBank highlights to the manager that Varenicline        sources into the Linked Data cloud, and the ensuing insights that
is an alpha-4 beta-2 neuronal nicotinic acetylcholine receptor        can be gained in the area of competitive intelligence. However,
agonist. However, Diseasome indicates that the corresponding          this is just the beginning, because more interesting and novel
genes are only important in nicotine addiction, rather than AD.       questions will be able to be addressed as additional data sets are
This suggests that there is a more complex relationship between       added. As a next step, it would be interesting to incorporate data
the diseases, than just sharing a drug target. Extending the          relating to epidemiology, as that could provide information
browsing to the SWAN Knowledgebase14 [12] shows that there            relating to geographical areas in which diseases are prevalent, and
are hypotheses relating AD to nicotinic receptors through amyloid     where there is a strong need for the development of a drug that
beta [13].                                                            meets the needs of a specific population. It would also be valuable
Using the Linked Data approach a business manager was able to         to create links to the AD hypotheses data that is in RDF within the
browse data relating to companies, clinical trials, drugs, diseases   SWAN Knowledgebase.
and genetic variation. More specifically, the manager was able to     Pharmaceutical companies need to make decisions based upon
determine when extra data was available, gain access to data          both internal and external data, it is therefore important that
without needing to map different identifiers and synonyms, and        companies begin to make internal data available in a linked
gain additional insights as to interesting questions to ask.          representation, both to break down the internal silos and to easily
                                                                      connect with external data. Such an approach would require
                                                                      organizations to understand where the linkage points occur across
                                                                      internal data sets, but this is ongoing work as it is a critical
                                                                      prerequisite for all data integration efforts relating to the effective
                                                                      tailoring of drugs.
                                                                      Currently, when pharmaceutical companies bring copies of data
                                                                      within their organizations for integration, they each need to have
                                                                      experts who understand the connectivity across data sets.
                                                                      However, with the Linked Data approach, this responsibility is
                                                                      shifted to the data providers. This is a much more efficient
                                                                      approach, as the data providers are the individuals who
                                                                      understand the data best. It also means that the integration only
                                                                      has to happen one time. In addition, it becomes possible for data
                                                                      providers to incrementally add links to new data sets as they
                                                                      become aware of their existence, rather than needing to design a
                                                                      model to do everything in one go. As stated in [14], reasoning and
                                                                      querying limitations can often be compensated for by integrating
                                                                      additional data resources.
                                                                      As the Linked Data cloud grows, focus in pharmaceutical
                                                                      companies will be moved to approaches for interpretation. One
                                                                      project with potential to utilize the value from Linked Data is the
                                                                      Large Knowledge Collider (LarKC), a platform for massive
                                                                      distributed incomplete reasoning that aims at removing the
                                                                      scalability barriers of currently existing reasoning systems for the
                                                                      Semantic Web15.
                                                                      The Linked Data approach is very promising for the
                                                                      pharmaceutical industry, and its value will increase as more data
                                                                      sources become available. However, our technical work as well as
                                                                      use case experiments revealed various challenges that need to be
                                                                      mitigated to make this approach robust enough to be deployed
                                                                      within an enterprise environment:
                                                                      1.     Progress needs to be made in finding links between data
                                                                             items across data sets where no commonly used identifiers
                                                                             exist. Discovering such links requires using specific record
      Figure 2. Data relating to Varenicline from LinkedCT,                  linkage [15] and duplicate detection [16] techniques
     DrugBank and Diseasome shown within the Marbles data                    developed within the database community as well as
                             browser.                                        ontology matching [17] methods from the knowledge
                                                                             representation literature. Recent work has proposed
                                                                             frameworks for simplifying this task for RDF data sets [18]
                                                                             and relational data [11]. In order to benefit from these

14                                                                    15
     http://hypothesis.alzforum.org/swan/                                  http://www.larkc.eu/
     frameworks for setting links within the LODD data sets,                   bioinformatics knowledge systems. J. Biomed. Infor. 41.
     domain experts need to identify linkage points and specific               706-716, 2008.
     rules required for finding the links.
                                                                          [7] Auer, S., Bizer, C., Lehmann, J., Kobilarov, G., Cyganiak,
2.   Work needs to be undertaken to make data browsers more                    R., Ives, Z. DBpedia: A Nucleus for a Web of Open Data. In
     robust and performant. In addition, the user interface of data            proceedings of the 6th International Semantic Web
     browsers needs to be improved. Life Sciences data                         Conference. Lecture Notes in Computer Science 4825
     frequently consists of long lists of entities (e.g. genes, trials,        Springer, ISBN 978–3-540–76297–3, 2007.
     diseases, patients) that need to be browsed, filtered, and           [8] Bizer, C., Cyganiak, R.: D2R Server - Publishing Relational
     queried. Benefits would be gained if hybrid interfaces that               Databases on the Semantic Web. Poster at the 5th
     combine querying and browsing would be available and able                 International Semantic Web Conference, 2006.
     to process the large amounts of data that are typically
     relevant within this domain. For such interfaces, it could be        [9] Wishart D.S., Knox C., Guo A.C., Shrivastava S., Hassanali
     promising to combine live data retrieval with local caching               M., Stothard P., Chang Z., Woolsey J.: DrugBank: a
     and in-advance crawling of relevant data sets, as it is                   comprehensive resource for in silico drug discovery and
     currently done by Semantic Web Search engines such as                     exploration. Nuc. Acids Res. 1(34): D668-72, 2006.
     Sindice [19] and Falcons [20].                                       [10] Goh K.-I., Cusick M.E., Valle D., Childs B., Vidal M.,
3.   A significant challenge within the life sciences and health               Barabási A.L.: The human disease network. Proc. Natl.
     care is the strong prevalence of terminology conflicts,                   Acad. Sci. USA 104:8685-8690, 2007.
     synonyms, and homonyms. These problems are not                       [11] Hassanzadeh O., Lim L., Kementsietsidis A., and Wang M.:
     addressed by simply making data sets available on the Web                 A Declarative Framework for Semantic Link Discovery over
     using RDF as common syntax but require deeper semantic                    Relational Data. Poster at the 18th World Wide Web
     integration. For applications that focus on discovery and data            Conference, 2009.
     navigation, having explicit links between data sources is
     often already a huge benefit even without semantic                   [12] Gao Y., Kinoshita J., Wu E., Miller E., Lee R., Seaborne A.,
     integration. For other applications that rely on expressive               Cayzer S., Clark T.: SWAN: A Distributed Knowledge
     querying or automated reasoning deeper integration is                     Infrastructure for Alzheimer Disease Research. J. Web Sem.
     essential. In order to also provide for such applications and             4(3): 222-228, 2006.
     lay the foundation for fusing data from several Linked Data          [13] Dineley, K.T., Westerman, M., Bui, D., Bell, K., Ashe K.H.,
     sources, it would be beneficial if more community practices               Sweatt, J.D.: b-Amyloid Activates the Mitogen-Activated
     on publishing term and schema mappings would be                           Protein Kinase Cascade via Hippocampal a7 Nicotinic
     established.                                                              Acetylcholine Receptors: In Vivo Mechanisms Related to
                                                                               Alzheimer’s Disease. J. Neurosci. 21(12):4125-4133, 2001.
5. ACKNOWLEDGEMENTS                                                       [14] Sahoo, S., Bodenreider, B., Rutter, J., Skinner, K., and
This work was undertaken within the LODD task of the W3C's                     Sheth, A.: An ontology-driven semantic mashup of gene and
Semantic Web for Health Care and Life Sciences Interest Group.                 biological pathway information: Application to the domain
Significant contributions to the LODD task have also been made                 of nicotine dependence. Journal of Biomedical Informatics
by Kei Cheung, Don Doherty, Matthias Samwald, and Jun Zhao.                    41: 752-765, 2008.
Anja Jentzsch and Chris Bizer received funding for this work
from Eli Lilly.                                                           [15] Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S. Duplicate
                                                                               record detection: A survey. IEEE Trans. Knowledge and
                                                                               Data Engineering, 19(1): 1–16, 2007.
6. REFERENCES
[1] Healthcare 2015: Win-win or lose-lose?                                [16] Winkler, W.: Overview of Record Linkage and Current
     www.Ibm.com/healthcare/hc2015.                                            Research Directions. Bureau of the Census, Technical
                                                                               Report, 2006.
[2] Gerhardsson de Verdier, M.: The Big Three Concept - A
     Way to Tackle the Health Care Crisis? Proc. Am. Thorac.              [17] Euzenat, J., Shvaiko, P.: Ontology Matching. Springer,
     Soc. 5: 800–805, 2008.                                                    Heidelberg, 2007.

[3] Andersson B., Momtchev V.: D7a.1.1 LarKC Requirements                 [18] Volz, J., Bizer C., Gaedke, M., and Kobilarov, G.: Silk – A
     summary and data repository,                                              Link Discovery Framework for the Web of Data. In: Linked
     http://wiki.larkc.eu/LarkcProject/WP7a.                                   Data on the Web workshop at WWW2009, 2009.

[4] Sharp, M., Bodenreider, O., and Wacholder, N.: A                      [19] Tummarello G. et al. Sindice.com: Weaving the Open
     framework for characterizing drug information sources.                    Linked Data. In: 6th International Semantic Web
     AMIA Annu. Symp. Proc. 2008 Nov 6:662-666.                                Conference, 2007.
     http://www.ncbi.nlm.nih.gov/pubmed/18999182.                         [20] Gong Cheng, H. W., Weiyi Ge, Qu Y.: Searching Semantic
[5] Goble, C., Stevens, R.: State of the Nation in Data                        Web Objects Based on Class Hierarchies. In: Linked Data on
     Integration for Bioinformatics. J. Biomed. Infor. 41: 687-                the Web workshop at WWW2008, 2008.
     693, 2008.
[6] Belleau F., Nolin., M.-A., Tourigny N., Rigault, P., and
     Morissette, J. Bio2RDF: Towards a mashup to build