=Paper= {{Paper |id=Vol-1327/26 |storemode=property |title=Ontobat: An Ontology-based Semantic Web Approach for Linked Data Processing and Analysis |pdfUrl=https://ceur-ws.org/Vol-1327/icbo2014_paper_58.pdf |volume=Vol-1327 |dblpUrl=https://dblp.org/rec/conf/icbo/XiangLH14 }} ==Ontobat: An Ontology-based Semantic Web Approach for Linked Data Processing and Analysis== https://ceur-ws.org/Vol-1327/icbo2014_paper_58.pdf
                                                     ICBO 2014 Proceedings


     Ontobat: An Ontology-based Semantic Web
   Approach for Linked Data Processing and Analysis
                                        Zuoshuang Xiang, Yu Lin, Yongqun He*
 Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and
   Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI 48109, USA

    Abstract — The Linked (Open) Data (LD/LOD) strategy extends            (i.e., "dereferenced") by people and user agents [4]. Ontobee
the Web by publishing various open datasets as RDF links on the            uniquely dereferences and presents ontology term URIs with a
Web. To support linked data query and analysis, we developed
                                                                           user-friendly HTML web display while providing RDF source
Ontobat, a Semantic Web strategy for automatic generation of linked
data RDFs using ontology formats, data uploading to a RDF triple           code for remote Semantic Web query by software applications
store, SPARQL query, browsing, and statistical data analysis. This         [3]. To support LOD data dereferencing and query, Lodbee
report introduces the rationale, design, and preliminary                   adopts the Ontobee technology for representing instance data
implementation of the Ontobat system (http://ontobat.hegroup.org).         stored in LOD RDF triple stores.
    Keywords — Ontobat; ontology; Semantic Web; LOD                             Ontostat provides statistical analysis of RDF-based LOD
                                                                           data, using open source software programs such as R-Sparql
                      I.   INTRODUCTION                                    (http://code.google.com/p/r-sparql/) which runs SPARQL
   Ontologies are one of the major components of the                       queries inside R and stores the results as an R data frame.
Semantic Web and Linked Data movements. The Semantic
                                                                                                                       Ontovert: convert
Web enables machines to understand the meaning of                                        Ontoload: upload data
                                                                                                                     instance data to RDF/
                                                                                          to RDF triple store
information on the Web. The Linked Open Data (LOD)                                                                        XML format
community aims to extend the Web by publishing various
open datasets as Resource Description Framework (RDF)                                                                 Lodquery: LOD data
links on the Web. These RDF links between data items can                                    RDF triple stores           SPARQL query

come from different data sources and be accessed anywhere                                                  run R Sparql
online [1]. Existing LOD data are primarily instance data.
                                                                                           Lodbee: LOD data            Ontostat: LOD data
Ontologies provide classifications and relations among these                                display and RDF            statistical analysis
instance data.                                                                             source generation          (e.g., meta-analysis)

   To support LOD data query and analysis, we have started to
develop Ontobat (http://ontobat.hegroup.org), a web-based                  Fig. 1. Ontobat components and workflow design. The Ontobat will store
                                                                           instance RDF data formatted based on OWL ontologies. The RDF data comes
biodata analysis tool that utilizes ontology-based Semantics               from automatic data conversion and loading. The data can be visualized by
Web methods. Ontobat is developed to support LOD data                      Lodbee and queried by Lodquery. Statistical tools will be developed under
generation, upload, query, browsing, and statistical analysis.             Ontostat. Statistical results can also be uploaded to a RDF triple store.
In Ontobat, all RDF/OWL-based LOD data are generated
based on reliable existing ontologies such as the OBO                                  III. CURRENT ONTOBAT DEVELOPMENT
Foundry ontologies [2]. This report provides the first time                    Since the Ontobat system contains many components, we
introduction of the Ontobat system design and development.                 do not expect to develop all the programs simultaneously. Our
                                                                           development strategy is to implement one program at a time
                II. ONTOBAT SYSTEM DESIGN                                  and later integrate all programs together.
    Ontobat is designed to be an integrative system including                  Currently, a prototype Ontobat program called Ontovert
several components (Fig. 1):                                               (http://ontobat.hegroup.org/ontovert/) has been developed
    Ontovert supports efficient conversion of instance data                (Fig. 2). The basic idea of Ontovert is to use the first row (or
from tab-delimited text or MS Excel format to an ontology                  header) to list ontology class term URIs, and use other rows to
format using the Web Ontology Language (OWL).                              represent data as instances of the class terms listed in the first
    Ontoload loads instance data to RDF triple store.                      row. The Ontovert web page provides an example tab-limited
    The RDF triple stores can be developed using different                 data extracted from a vaccine protection meta-analysis study
systems, such as the Open-Source Virtuoso platform as                      [5]. The first row of the tab-limited input data lists term IDs
implemented in our Hegroup RDF triple store [3].                           from the Vaccine Ontology (VO) [6]. After the VO is selected
    Lodquery provides RDF data query functions based on the                and the data is provided, the Ontovert program generates an
SPARQL Protocol and RDF Query Language. A user-friendly                    OWL output file that specifies the instance data as named
web interface is usually required.                                         individuals of the VO terms. The relations of the VO terms are
    Lodbee supports the browsing and dereferencing of LOD                  specified in VO and can be retrieved using the tool OntoFox
data. The LOD movement requires the usage of URIs to                       [7]. The OntoFox feature is not yet implemented in Ontovert.
denote things and these URIs to be referred to and looked up




                                                                      93
                                                                     ICBO 2014 Proceedings

However, the Ontovert and OntoFox OWL output files can                                     the ANOVA analysis feature can be implemented in the
then be merged to show the output results seen in Fig. 2.                                  Ontostat program in Ontobat. The Ontology of Biological and
                                                                                           Clinical Statistics (OBCS) is a newly reported ontology that
                                                                                           aligns with OBI and supports semantic biostatistics analysis
                                                                                           [10]. Ontostat may use OBCS at the backend ontology for
                                                                                           enhanced statistical analysis.
                                                                                               While Ontobat is still under its early development stage, we
                                                                                           would like to demonstrate the Ontobat design strategy and
                                                                                           discuss the program design and implementation issues with
                                                                                           researchers at the ICBO-2014 conference.

                                                                                                               ACKNOWLEDGMENT
                                                                 As OntoFox input
                                                                                              This research was supported by NIH grant R01AI081062.

                                                                                                                      REFERENCES
                                                              OntoFox output

                                        Merge two OWL files
                                                                                           [1]        T. Berners-Lee. (2009). Design Issues: Linked Data.
                                                                                           Available: http://www.w3.org/DesignIssues/LinkedData
                                                                                           [2]        B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W.
                                                                                           Ceusters, et al., "The OBO Foundry: coordinated evolution of
                                                                                           ontologies to support biomedical data integration," Nat Biotechnol,
                                                                                           vol. 25, pp. 1251-5, Nov 2007.
                                                                                           [3]        Z. Xiang, C. Mungall, A. Ruttenberg, and Y. He, "Ontobee:
                                                                                           A linked data server and browser for ontology terms," in The 2nd
 VO_0001203
                                                                                           International Conference on Biomedical Ontologies (ICBO), Buffalo,
                                                                                           NY, USA, 2011, pp. Pages 279-281 [http://ceur-ws.org/Vol-
                                                                                           833/paper48.pdf].
Fig. 2. Ontovert example. The  output  shows  “42”  days,  an  instance  data  of          [4]        R. Lewis. (2007, Nov 13). Dereferencing HTTP URIs.
the  VO  class  ‘vaccination-challenge  interval  in  days’  (VO_0001203).  See the
text for more explanation.                                                                 Available: http://www.w3.org/2001/tag/doc/httpRange-14/2007-05-
                                                                                           31/HttpRange-14
    A prototype Lodquery has also been established                                         [5]        T. E. Todd, O. Tibi, Y. Lin, S. Sayers, D. N. Bronner, Z.
(http://ontobat.hegroup.org/lodquery). The Lodquery uses the                               Xiang, et al., "Meta-analysis of variables affecting mouse protection
Hegroup RDF triple store [3] as the default triple store. The                              efficacy of whole organism Brucella vaccines and vaccine
other programs listed in Fig. 1 (e.g., Ontoquery and Ontostat)                             candidates," BMC Bioinformatics, vol. 14 Suppl 6, p. S3, 2013.
are still under development.                                                               [6]        Y. He, L. Cowell, A. D. Diehl, H. L. Mobley, B. Peters, A.
    To show the usage of Semantic Web in solving scientific                                Ruttenberg, et al., "VO: Vaccine Ontology," in The 1st International
questions in a specific domain, we have developed an Ontobat                               Conference on Biomedical Ontology (ICBO-2009), Buffalo, NY,,
program OntoCOG (http://ontobat.hegroup.org/ontocog) [8].                                  2009, URL: http://precedings.nature.com/documents/3552/version/1.
OntoCOG demonstrates how we uses the Semantic Web                                          [7]        Z. Xiang, M. Courtot, R. R. Brinkman, A. Ruttenberg, and
approach to support statistical enrichment analysis of the                                 Y. He, "OntoFox: web-based support for ontology reuse," BMC Res
Clusters of Orthologous Groups of proteins (COGs) [8].                                     Notes, vol. 3, p. 175, 2010.
                                                                                           [8]        Y. Lin, Z. Xiang, and Y. He, "Towards a Semantic Web
                        IV. DISCUSSION                                                     application: Ontology-driven ortholog clustering analysis,"
    Ontobat is an ontology-based Semantic Web system                                       Proceedings of the second International Conference on Biomedical
primarily targeting for ontology-based instance data processing                            Ontologies (ICBO), University at Buffalo, NY, July 26-30, 2011, pp.
and analysis. The reliance on ontology for instance RDF data                               Pages 33 - 40. , 2011.
generation can be reflected in our Ontovert example (Fig. 2).                              [9]        Y. He, Z. Xiang, T. Todd, M. Courtot, R. R. Brinkman, J.
The usage of reliable ontologies for RDF/OWL data generation                               Zheng, et al., "Ontology representation and ANOVA analysis of
provides a feasible way for data integration and sharing, and it                           vaccine protection investigation," in Bio-Ontologies 2010: Semantic
supports consistent and integrative data analysis.                                         Applications in Life Sciences, Boston, MA, USA, 2010, pp. Pages 1-8
    The Fig. 2 example was originated from a previous study                                [http://ceur-ws.org/Vol-754/he_krmed2010.pdf].
that modeled an Analysis of Variance (ANOVA) statistical                                   [10]       J. Zheng, M. R. Harris, A. M. Masci, Y. Lin, A. Hero, B.
analysis using the framework of the Ontology for Biomedical                                Smith, et al., "OBCS: The Ontology of Biological and Clinical
Investigations (OBI) [9]. To make Ontovert function more                                   Statistics," in The 2014 International Conference on Biomedical
efficiently, the OntoFox feature as shown in the Fig. 2 use case                           Ontologies (ICBO 2014), Houston, TX, USA, 2014, pp. 1-6.
can be incorporated into the Ontovert program. Furthermore,




                                                                                      94
                                                                                                  ICBO 2014 Proceedings




                                              Ontobat: An Ontology-based Semantic Web Approach
                                                                                                                                                                                                                          Tel: (734) 615 8231
                                                   for Linked Data Processing and Analysis                                                                                                                             yongqunh@umich.edu
                                                        Zuoshuang “Allen”  Xiang,  Yu  “Asiyah”  Lin,  and  Yongqun  “Oliver”  He                                                                                      http://www.hegroup.org

                                                               University of Michigan Medical School, Ann Arbor, MI 48109, USA


                                           Abstract                                                                                                 Current Ontobat Development
The Linked (Open) Data (LD/LOD) strategy extends the Web by publishing                                           Since the Ontobat system contains many components, we do not expect to
various open datasets as RDF links on the Web. To support linked data query                                      develop all the programs simultaneously. Our development strategy is to
and analysis, we developed Ontobat, a Semantic Web strategy for automatic                                        implement one program at a time and later integrate all programs together.
generation of linked data RDFs using ontology formats, data uploading to a RDF                                      Currently,     a     prototype       Ontobat      program       called     Ontovert
triple store, SPARQL query, browsing, and statistical data analysis. This report                                 (http://ontobat.hegroup.org/ontovert/) has been developed (Fig. 2). The basic idea
introduces the rationale, design, and preliminary implementation of the Ontobat                                  of Ontovert is to use the first row (or header) to list ontology class term URIs, and
system (http://ontobat.hegroup.org).                                                                             use other rows to represent data as instances of the class terms listed in the first
                                                                                                                 row. The Ontovert web page provides an example tab-limited data extracted from
                                                                                                                 a vaccine protection meta-analysis study [3].
                                       Introduction                                                                 A       prototype      Lodquery         has        also      been       established
                                                                                                                 (http://ontobat.hegroup.org/lodquery). The Lodquery uses the Hegroup RDF triple
Ontologies are one of the major components of the Semantic Web and Linked                                        store [2] as the default triple store. The other programs listed in Fig. 1 (e.g.,
Data movements. The Semantic Web enables machines to understand the                                              Ontoquery and Ontostat) are still under development.
meaning of information on the Web. The Linked Open Data (LOD) community                                             To show the usage of Semantic Web in solving scientific questions in a specific
aims to extend the Web by publishing various open datasets as Resource                                           domain,      we     have     developed      an     Ontobat      program     OntoCOG
Description Framework (RDF) links on the Web. These RDF links between data                                       (http://ontobat.hegroup.org/ontocog) [4]. OntoCOG demonstrates how we uses
items can come from different data sources and be accessed anywhere online                                       the Semantic Web approach to support statistical enrichment analysis of the
[1]. Existing LOD data are primarily instance data. Ontologies provide                                           Clusters of Orthologous Groups of proteins (COGs) [4].
classifications and relations among these instance data.
                                                                                                                                                                                                                   Fig. 2. An Ontovert example.
To support LOD data query and analysis, we have started to develop Ontobat                                                                                                                                         The Ontovert output shows
(http://ontobat.hegroup.org), a web-based biodata analysis tool that utilizes                                                                                                                                      “42” days, an instance data of
ontology-based Semantics Web methods. Ontobat is developed to support LOD                                                                                                                                          the Vaccine Ontology (VO)
                                                                                                                                                                                                                   class      ‘vaccination-challenge
data generation, upload, query, browsing, and statistical analysis. In Ontobat, all                                                                                                                                interval in days’ (VO_0001203).
RDF/OWL-based LOD data are generated based on reliable existing ontologies                                                                                                                                         The first row of the tab-limited
such as the OBO Foundry ontologies. This report provides the first time                                                                                                                                            input data lists term IDs from
introduction of the Ontobat system design and development.                                                                                                                                                         the VO [5]. After the VO is
                                                                                                                                                                                        As OntoFox input           selected and the data is
                                                                                                                                                                                                                   provided, Ontovert generates
                            Ontobat System Design                                                                                                                                                                  an OWL output file that
                                                                                                                                                                                     OntoFox output                specifies the instance data as
                                                                                                                                                               Merge two OWL files                                 named individuals of the VO
Ontobat is designed to be an integrative system with many components (Fig. 1):                                                                                                                                     terms. The relations of the VO
• Ontovert supports efficient conversion of instance data from tab-delimited text                                                                                                                                  terms are specified in VO and
  or MS Excel format to an ontology format using the Web Ontology Language                                                                                                                                         can be retrieved using the tool
                                                                                                                                                                                                                   OntoFox [6]. The OntoFox
  (OWL).                                                                                                                                                                                                           feature is not yet implemented
• Ontoload loads instance data to RDF triple store.                                                                                                                                                                in Ontovert. However, the
• The RDF triple stores can be developed using different systems, e.g., Open-                                                                                                                                      Ontovert and OntoFox OWL
  Source Virtuoso platform as implemented in our Hegroup RDF triple store [2].                                      VO_0001203
                                                                                                                                                                                                                   output files can then be merged
• Lodquery provides RDF data query functions based on the SPARQL Protocol                                                                                                                                          to show the output results.
  and RDF Query Language. A user-friendly web interface is usually required.
• Lodbee supports the browsing and dereferencing of LOD data. The LOD                                                                                                        Discussion
  movement requires the usage of URIs to denote things and these URIs to be
  referred to and looked up (i.e., "dereferenced") by people and user agents.                                      Ontobat is an ontology-based Semantic Web system primarily targeting for
  Ontobee uniquely dereferences and presents ontology term URIs with a user-                                       ontology-based instance data processing and analysis. The usage of reliable
  friendly HTML web display while providing RDF source code for remote                                             ontologies for RDF/OWL data generation provides a feasible way for data
  Semantic Web query by software applications. To support LOD data                                                 integration and sharing, and it supports consistent and integrative data analysis.
  dereferencing and query, Lodbee adopts the Ontobee technology for
                                                                                                                   The ANOVA analysis feature can be implemented in the Ontostat program in
  representing instance data stored in LOD RDF triple stores.
                                                                                                                   Ontobat. The Ontology of Biological and Clinical Statistics (OBCS) is a newly
• Ontostat provides statistical analysis of RDF-based LOD data, using open
                                                                                                                   reported ontology that aligns with OBI and supports semantic biostatistics
  source software programs such as R-Sparql (http://code.google.com/p/r-
                                                                                                                   analysis [7]. Ontostat may use OBCS at the backend ontology for enhanced
  sparql/) which runs SPARQL queries inside R and stores the results as an R
                                                                                                                   statistical analysis.
  data frame.
                                                                                                                                                                       Acknowledgements
                                 Ontovert: convert                                                                                         This work is supported by NIH-NIAID Grant 1R01AI081062 to YH.
    Ontoload: upload data                                     Fig. 1. Ontobat components and
                               instance data to RDF/
     to RDF triple store                                      workflow design. The Ontobat will                                                                                  References
                                    XML format
                                                              store instance RDF data formatted
                                                                                                                     1.   T. Berners-Lee. (2009). Design Issues: Linked Data. Available: http://www.w3.org/DesignIssues/LinkedData
                                                              based on OWL ontologies. The RDF                       2.   Z. Xiang, C. Mungall, A. Ruttenberg, and Y. He, "Ontobee: A linked data server and browser for ontology terms," in The 2nd International
                                Lodquery: LOD data            data comes from automatic data                              Conference on Biomedical Ontologies (ICBO), Buffalo, NY, USA, 2011, pp. Pages 279-281 [http://ceur-ws.org/Vol-833/paper48.pdf].
                                                                                                                     3.   He Y, Xiang Z, Todd T, Courtot M, Brinkman R, Zheng J, Stoeckert CJ, Malone J, Rocca-Serra P, Sansone S, Fostel J, Soldatova LN,
      RDF triple stores           SPARQL query                conversion and loading. The data can                        Peters B, Rutternberg A. Ontology representation and ANOVA analysis of vaccine protection investigation. Proceeding of Bio-Ontologies
                                                              be visualized by Lodbee and queried                         2010: Semantic Applications in Life Sciences, ISMB, July 9-10, 2010. Boston, MA, USA.
                     run R Sparql                                                                                    4.   Y. Lin, Z. Xiang, and Y. He, "Towards a Semantic Web application: Ontology-driven ortholog clustering analysis," Proceedings of the
                                                              by Lodquery. Statistical tools will be                      second International Conference on Biomedical Ontologies (ICBO), University at Buffalo, NY, July 26-30, 2011, pp. Pages 33 - 40. , 2011.
     Lodbee: LOD data            Ontostat: LOD data           developed under Ontostat. Statistical                  5.   Y. He, L. Cowell, A. D. Diehl, H. L. Mobley, B. Peters, A. Ruttenberg, et al., "VO: Vaccine Ontology," in The 1st International Conference on
                                                                                                                          Biomedical Ontology (ICBO-2009), Buffalo, NY,, 2009, URL: http://precedings.nature.com/documents/3552/version/1.
      display and RDF            statistical analysis         results can also be uploaded to a RDF                  6.   Z. Xiang, M. Courtot, R. R. Brinkman, A. Ruttenberg, and Y. He, "OntoFox: web-based support for ontology reuse," BMC Res Notes, vol. 3,
     source generation          (e.g., meta-analysis)
                                                              triple store.                                          7.
                                                                                                                          p. 175, 2010.
                                                                                                                          J. Zheng, M. R. Harris, A. M. Masci, Y. Lin, A. Hero, B. Smith, et al., "OBCS: The Ontology of Biological and Clinical Statistics," in The
                                                                                                                          2014 International Conference on Biomedical Ontologies (ICBO 2014), Houston, TX, USA, 2014, pp. 1-6.



                                                                                                           95