<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Web Application Towards Semiotic-based Evaluation of Biomedical Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muhamamd \Tuan" Amith</string-name>
          <email>muhammad.f.amith@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cui Tao</string-name>
          <email>cui.tao@uth.tmc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Texas Health Science Center, School of Biomedical Informatics</institution>
          ,
          <addr-line>Houston, Texas</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the emerging importance of biomedical ontology research impacting Big Biomedical Data, there will be a need for knowledgebase evaluation that is both systematic and also engage a community of experts. This paper will introduce a prototype in production to evaluate the quality of formal ontologies through an online web tool, using the semiotic-in uenced metrics to grade ontology quality. Here we introduce the Semiotic-based Evaluation Management System (SEMS), which is designed for (1) automatic generation of various quality scores of an uploaded ontology and recommendations for improvement for the ontology, and (2) a GUI for experts to conduct manual review and provide feedback. In this paper, we will discuss the current status of the tool as well as the course for its continued development.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology</kwd>
        <kwd>Ontology Evaluation</kwd>
        <kwd>Big Data</kwd>
        <kwd>Semiotics</kwd>
        <kwd>Knowledge Engineering</kwd>
        <kwd>Knowledge Management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        An article from Scienti c America [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] described Tim Berners Lee's vision of
\Semantic Web" or linked meaningful data on the web. Most interesting, the
use case that elaborated his vision was of health care scenario where patients
can access health information through software agents. While the semantic web
vision (\Web 3.0") may (or may not) be possible in the foreseeable future, the
copious amount of health information on the World Wide Web is growing.
      </p>
      <p>
        The massive growth of information have ushered a new discipline called Big
Data. Big Data, according to the International Data Corporation, are
\technologies describe a new generation of technologies and architectures, designed to
economically extract value from very large volumes of a wide variety of data, by
enabling the high-velocity capture, discovery, and/or analysis."[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] Healthcare
has been dramatically a ected by these new technologies, saving $300 billion
dollars from analytics of Big Data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], mitigating diseases, and a ecting patient
health behaviors [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Biomedical ontologies can and will play an important role
in Big Data, speci cally with consolidating variety in Big Data and introducing
reasoning and analytical functions, from its success in \Small Data".
While this success is undisputed with the vast amount of biomedical literature
highlighting biomedical ontologies for encoding knowledge and machine
reasoning, the evaluation of ontologies is not settled [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Ontology evaluation \is
the problem of assessing a given ontology from the point of view of a
particular criterion of application, typically in order to determine which of several
ontologies would best suit a particular purpose" [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For the last decade several
ideas emerged addressing ontology evaluation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but none have appeared to
be adopted universally by ontologists [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Commonly, subject matter expert
(SME) reviewers are sought to evaluate an ontology. However, this e ort is a
time and resource intensive approach, especially if the reviewers need to
acclimate themselves on the topic of ontology and ontology-related tools, like Protege
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A brief review of 200 randomly selected biomedical ontologies hosted on the
National Center of Biomedical Ontologies' (NCBO) BioPortal reveal that only
17 out of 200 have a formal assessment described in a corresponding design
paper, and the remaining do not have any explicit documented evaluation. With
ontologies helping to further research in the biomedical domain, this highlights
a strong need for evaluation for biomedical ontologies.
1.2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Semiotics in Ontology Evaluation</title>
      <p>
        Ontologies are sometimes alluded as symbolic representations of a domain space
where the terms signify the entities contained within the domain space. Likewise,
semiotics is a study of meaning behind signs and symbols or representations,
divided by three aspects - pragmatic, syntactic, and semantic. Burton-Jones,
et al. introduced an ontology evaluation framework based on the theories of
semiotics that utilized various metrics formulated within the three branches of
semiotics, along with an additional branch called \social " [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Each evaluation
criteria, based on the branches, asks if the ontology is \useful" (pragmatic), can it
be \read" (syntactic), can it be \understood" (semantic), and can it be \trusted"
(social ). Each of these branches are decomposed to additional aspects that derive
their values from data acquired from the ontology and external sources.
      </p>
      <p>
        The authors of this paper introduce a Java web application, Semiotic
Evaluation Management System (SEMS), to assist ontologists and reviewers to measure
the qualities of their ontologies based on semiotic-inspired metrics. Previously,
the authors have successfully utilized this framework in a previous study for
patient-centered vaccine ontology [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], all while discovering ways to streamline
the process in an all-in-one tool. The remaining sections introduce the
implementation of the application and discuss further development for public release.
2
      </p>
      <sec id="sec-2-1">
        <title>SEMS - Semiotic-based Evaluation</title>
      </sec>
      <sec id="sec-2-2">
        <title>System</title>
      </sec>
      <sec id="sec-2-3">
        <title>Management</title>
        <p>SEMS is a web application tool designed to assist knowledge engineers to assess
the strengths and weakness of their ontologies using the evaluation framework
proposed by Burton-Jones, et al. The advantages of using their framework is that
it is designed to accommodate various users, it is domain independent, and it
is both uncomplicated and comprehensive. Their work included a C-based
software, but the SEMS application would be the rst public tool of its type that
fuses semiotic-driven evaluation for ontologies, and an online platform to
generate rapid evaluations of formal ontologies for the aim of promoting uniformed
ontology evaluation. SEMS is an online software that is developed in Java with
a modern HTML5 interface and hosted through an Apache Tomcat application
server. It will permit ontologists to log in to their account and upload their
encoded ontology le. SEMS will then calculate the various scores and allow for
the user to invite SMEs to participate in a formal review process to verify the
truthfulness from the ontology.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Prototype</title>
      <p>Currently, the authors have implemented an operational prototype. Figure 1
captures the initial view when activating the application. First, an ontologist can
specify the preferred evaluation criteria (i.e. pragmatic, syntactic, semantic) for
assessing the ontology. Other options include identifying how the preprocessing
should handle the annotations from the ontology le. These options include
breaking camel cases; removing determiners, brackets, underscores, and dashes;
and determining whether the ontology is using the labels annotation or unique
identi ers.</p>
      <p>Figure 2 shows a screen where the user uploads the ontology le to the server
and the aftermath of the preprocessing functions. After the application alerts the
user that the ontology has been uploaded, the user can click \Preprocess" and
the server will output each of terms extracted, and its corresponding cleaned
term based on the con guration. In addition, each term will have the number of
word senses calculated from the WordNet component. For example, a term such
as \Mild" would have three word senses, and a compounded term, like \Mild
Fever" would have ve word senses 1. Concurrently, the server will also extract
the statements evoked from the ontology in simple natural language statements
for SMEs to evaluate the truthfulness.</p>
      <p>After the ontology has been processed, the user can navigate using the left
column to tab through the various scores available. Figure 3 shows one of the
screens associated with the syntactic aspect with tabs to navigate to a sub-score.
For each of the scores, the user can choose to include or exclude parts of the
scoring depending on the purpose of the evaluation.</p>
      <p>Also implemented, the software tool facilitates a knowlegebase review to
determine the truthfulness of the ontology (Figure 4). A SME can scroll through all
of the statements evoked from the ontology and denote whether the statement
is true, false, or other. This is demonstrated by clicking the \Add Assessment"
button beside the statement, which will display a pop-up to assess it (Figure 4).
1 Mild has three word senses and fever has two. This adds to a total of ve word
senses for mild fever.</p>
      <p>
        For a brief test demonstration, we utilized the Wine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and Pizza [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
ontology. For the con guration, the tool was set to remove camel cases, determiners,
underscores, and dashes from the labels from both ontologies. We excluded the
social criteria, not only because the component was under-development, but due
to insu cient data to collect on the number of ontologies extending to it and
the number of times it has been downloaded. We excluded accuracy aspect of
pragmatic quality, due to lack of expertise to evaluate the speci cs of pizzas and
wines and the review interface is still under-development. The following gures
(5 and 6) shows the nal overall quality score calculated with the available data
from the the pragmatic, semantic, and syntactic scores. While both of these
ontologies are \toy" ontologies used by knowledge engineers, it would be trivial
to examine speci c scores, but potentially, the knowledge engineer can view the
scores and determine areas of improvement. For example, while both ontologies
exhibited nearly equal quality in both syntactic and semantic aspects, the
comprehensiveness (under pragmatic quality) for the Pizza ontology was relatively
lower than the Wine ontology - 0:23 and 0:50 respectively. Here, this would
reveal that the Pizza ontology lacks enough classes based on a proportional ratio
and may need to de ne additional, since comprehensiveness measures the size of
the ontology, as indication of whether it covers the domain completely. Another
example would be richness, which is the proportion of the number of
ontology features utilized, where both ontologies account for using nearly half of the
available ontology features - 0:44 and 0:56. This would reveal that the ontologist
may need to consider incorporating more ontological features. Understandably,
knowing the speci cs of each measurement may be cumbersome for knowledge
engineers and perhaps defeat the purpose of generating on-demand evaluation
scores. One of the future possibilities is to include automated suggestions and
description of the quality scores for the knowledge engineer to improve their
ontology and to learn more about the speci cs of the metric suite.
      </p>
      <sec id="sec-3-1">
        <title>Upcoming Development and Conclusion</title>
        <p>SEMS was developed to address the needs of ontology evaluation for authors
and experts to encourage ontology usability, and help systematize and
streamline the evaluation process. Currently, SEMS is in developmental status and
further testing and enhancements are underway, which includes testing the tool
with biomedical ontologies and employing the help of biomedical SMEs to
determine accuracy. Some of the immediate areas under development were alluded
to in this paper - user account management, expert interface for SMEs'
evaluation of statements, exibility to handle diverse labeling, suggestions for users to
improve their ontology, etc. Also, possible future development may include
integrating with NCBO BioPortal's REST service to directly access ontologies and
community metrics, and we intend to investigate conformity with The Open
Biological and Biomedical Ontologies Foundry's standardization requirements and
addressing alignment with upper ontologies. While SEMS is a prototype
experimenting with a particular theoretical ontology evaluation framework, there is
much room to explore.</p>
        <p>
          Since the semiotic evaluation suite is adaptable, certain aspects for some of
the criteria were excluded (like the relevance aspect from the pragmatic criteria
2), or they need to be overhauled for the present ontology community. The
authors have also considered the possibility of further extending the metric suite
to evaluate other features of the ontology that have been supported in ontology
evaluation literature - structural assessment [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], ontology question and
answering, and perhaps di erential semantics to evaluate entities' parent and sibling
relationships [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. As ontologies play an important role in biomedical research for
Big Data and analytics, there exist a need to validate and evaluate ontologies,
a role that the SEMS application could accommodate.
        </p>
        <p>
          Acknowledgments. Research was partially supported by the National
Library Of Medicine of the National Institutes of Health under Award Number
R01LM011829 and the Cancer Prevention Research Institute of Texas (CPRIT)
Training Grant #RP140103
2 See [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for details.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. protege, http://protege.stanford.edu/</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>2. Wine ontology, http://www.w3.org/TR/owl-guide/wine.rdf</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brewster</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Metrics for ranking ontologies (</article-title>
          <year>2006</year>
          ), http://eprints. ecs.
          <source>soton.ac.uk/12603</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Almeida</surname>
          </string-name>
          , M.B.:
          <article-title>A proposal to evaluate ontology content</article-title>
          .
          <source>Applied Ontology</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ),
          <volume>245</volume>
          {
          <fpage>265</fpage>
          (
          <year>2009</year>
          ), http://iospress.metapress.com/index/B2L2XT606156H141.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Amith</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boom</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Developing</surname>
            <given-names>VISO</given-names>
          </string-name>
          :
          <article-title>Vaccine information statement ontology for patient education</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <volume>23</volume>
          (May
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bachimont</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isaac</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troncy</surname>
          </string-name>
          , R.:
          <article-title>Semantic commitment for designing ontologies: a proposal. In: Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web</article-title>
          , pp.
          <volume>114</volume>
          {
          <fpage>121</fpage>
          . Springer (
          <year>2002</year>
          ), http://link.springer. com/chapter/10.1007/3-540-45810-7_
          <fpage>14</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lassila</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <article-title>others: The semantic web</article-title>
          . Scienti c american
          <volume>284</volume>
          (
          <issue>5</issue>
          ),
          <volume>28</volume>
          {
          <fpage>37</fpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Brank</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grobelnik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mladenic</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey of ontology evaluation techniques (</article-title>
          <year>2005</year>
          ), http://eprints.pascal-network.org/archive/00001198/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Burton-Jones</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Storey</surname>
            ,
            <given-names>V.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugumaran</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahluwalia</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A semiotic metrics suite for assessing the quality of ontologies</article-title>
          .
          <source>Data &amp; Knowledge Engineering</source>
          <volume>55</volume>
          (
          <issue>1</issue>
          ),
          <volume>84</volume>
          {102 (Oct
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          :
          <article-title>Big data: A survey</article-title>
          .
          <source>Mobile Networks and Applications</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <volume>171</volume>
          {209 (Apr
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Drummond</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wroe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sampaio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Pizza ontology</article-title>
          . The University of Manchester (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gantz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinsel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Extracting value from chaos</article-title>
          .
          <source>IDC iview (1142)</source>
          ,
          <volume>9</volume>
          {
          <fpage>10</fpage>
          (
          <year>2011</year>
          ), https://www.emcgrandprix.com/collateral/analyst-reports/
          <article-title>idc-extracting-value-from-chaos-ar</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miron-Shatz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>A.Y.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Big data in science and healthcare: A review of recent literature and perspectives: Contribution of the IMIA social media working group</article-title>
          .
          <source>IMIA Yearbook</source>
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <volume>21</volume>
          {
          <fpage>26</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shihan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Structure-based ontology evaluation</article-title>
          . In: e-Business
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          ,
          <year>2006</year>
          . ICEBE'06. IEEE International Conference on. pp.
          <volume>132</volume>
          {
          <fpage>137</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2006</year>
          ), http://ieeexplore.ieee.org/xpls/abs_all.
          <source>jsp?arnumber=4031643</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Obrst</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceusters</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ray</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The evaluation of ontologies</article-title>
          .
          <source>In: Semantic Web</source>
          , pp.
          <volume>139</volume>
          {
          <fpage>158</fpage>
          . Springer (
          <year>2007</year>
          ), http://link.springer. com/chapter/10.1007/978-0-
          <fpage>387</fpage>
          -48438-
          <issue>9</issue>
          _
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>