<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visual presentation of mappings between biomedical ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Kocbek</string-name>
          <email>simon@dbcls.rois.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Luc Perret</string-name>
          <email>jean-luc.perret@novartis.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jin-Dong Kim</string-name>
          <email>jdkim@dbcls.rois.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Database Center for Life Science, Research Organization of Information and Systems</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Novartis Animal Health, Centre de Recherches</institution>
          ,
          <addr-line>St-Aubin</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology mapping focuses on finding correspondences between concepts from different ontologies. While the amount of available ontologies is increasing, also the number of mappings between them is getting higher. Visualization techniques can be used to help researchers in forming a picture of this information. In the paper we present a visual presentation of mappings between BioPortal ontologies. We present results in the form of a graph where identified communities of tightly connected ontologies are shown.We use metrics such as Betweenness Centrality and Community Detection.</p>
      </abstract>
      <kwd-group>
        <kwd>ontology</kwd>
        <kwd>ontology mapping</kwd>
        <kwd>visualization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Creating mappings among ontologies by identifying concepts with similar
meanings is a critical step in integrating data and applications that use different ontologies
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        BioPortal is a web portal developed by The National Center for Biomedical
Ontology (NCBO) that provides access to a library of biomedical ontologies and
terminologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The ontologies are published by several different groups (e.g., the OBO
library, and the Proteomics Standards Initiative) and grouped in 40 categories (e.g.,
Anatomy, Cell, and Health). Concepts in BioPortal ontologies often overlap and
information about mappings between ontologies is available. Two ontologies are
mapped when they contain at least one pair of concepts with similar meaning (i.e.,
the concept c1 from the ontology O1 has similar meaning as the concept c2 from the
ontology O2). Our analysis showed more than 30,000 BioPortal mappings . It is hard
for humans to understand and form a picture of so many connected ontologies. In
addition, ontology mappings are often considered in activities such as data
integration, ontology ranking and recommendation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or ontology reuse. The later is a also
one of the interests in our group where we are developing the OntoFinder/Factory
tool1 which uses BioPortal ontologies. As a result, we believe that it would be useful
to provide visualization of mappings between biomedical ontologies from BioPortal
in a form of a graph where each node would present an ontology and edges would
present mappings between the ontologies. This kind of graph can provide a macro
view of related biomedical ontologies for researchers who are interested in them.
      </p>
      <p>In the next section we describe our visual analysis of mappings between BioPortal
ontologies. We conclude the paper in Section 3 where we also provide guidelines for
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Visualization of BioPortal Mapping Data</title>
      <p>For each BioPortal ontology, we collected the following data through the BioPortal
web services: the ontology’s full name (e.g., Gene Ontology), the ontology’s name
abbreviation (e.g., GO), status of the ontology (e.g., production), the number of
classes in the ontology, and the number of mappings from/to the ontology. Initially,
the data for more than 320 ontologies was collected. However, this number was
reduced to 284 since we filtered out ontologies that: (1) have the retired or alpha status,
(2) contain the keyword test in their name, and (3) are labelled as restricted or private.</p>
      <p>After collecting the data, we identified 30,560 mappings between 254 ontologies
(i.e., each of these ontologies contained at least one concept mapped to another
ontology). The remaining 30 ontologies had no reference to other ontologies. The majority
of the identified mappings were bidirectional and symmetric. This means that when
an ontology O1 referenced an ontology O2 with x number of concepts, then also O2
referenced O1 with the same number of concepts. Only 218 asymmetric ontology
pairs were found in our data.</p>
      <p>
        We used Gephi (i.e., an open source software for graph analysis and visualization)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to visualize our data. Gephi provides layout algorithms to draw large graphs as
well as node and edge filtering capabilities. In addition, a number of graph and node
properties can be calculated with Gephi. For the scope of this paper the following two
main features were used:
 Modularity Analysis (or Community Detection) is a measure of structure in graphs.
      </p>
      <p>
        Graphs with high modularity have separate communities of densely connected
nodes inside the communities and sparse connection across communities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
 Betweenness centrality [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a measure of the frequency of occurrence of a
particular node in all shortest paths between any two nodes.
1 http://ontofinder.dbcls.jp/
a higher number of related concepts. The node size is proportional to the betweenness
centrality metrics. The node colour represents membership to one of the communities
detected by modularity analysis.
      </p>
      <p>We obtained a graph density of 0.38 and a modularity of 0.346 which indicate a
relatively homogeneous graph with little structure. Nevertheless, the community
detection revealed five communities of interconnected nodes. Two of these
communities, clearly discriminate communities of ontologies related to anatomy and clinical
terms. These two communities also relate to BioPortal’s category classification since
majority of ontologies in each community belong to the same or related categories.
The three other identified communities are more heterogeneous. The graph also
shows the top three ontologies in term of betweenness centrality are SNOMEDCT,
NCIt and MSH.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This work was our first attempt to visualize mapping data from BioPortal and as
such opens additional research questions and opportunities. Our graph implies that
clinical terms and anatomy related ontologies seem to map their terms much more
than ontologies in other topics. It is difficult to interpret this observation and could be
very much related to the way ontologies in the different domains were built and the
needs of applications in fields like pathology. In fields where the number of mappings
is large the present analysis may be useful to learn about ontologies in a particular
context, especially if one of the ontologies of interest is known. In the future, we
would like to analyse internal structure of ontologies and see if there is any
connection between the most important terms and identified communities. Since there are
many plugins available for Gephi, we would also like to experiment with different
add-ons and see whether we can visualize edges in a better way. In addition, a large
portion of the mappings in BioPortal are automatically calculated. It would be
interesting to see how the visualization changes while methods for automatically ontology
mapping and alignment improve.</p>
      <p>Acknowledgments:
The work in this paper was inspired at the BioHackathon 2012 event.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ghazvinian</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Creating mappings for ontologies in biomedicine: simple methods work</article-title>
          .
          <source>AMIA Annu. Symp. proc</source>
          .
          <year>2009</year>
          :
          <fpage>198</fpage>
          -
          <lpage>202</lpage>
          , (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Whetzel</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          , et al.:
          <article-title>BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications</article-title>
          .
          <source>Nucleic Acids Res</source>
          ;
          <volume>39</volume>
          :
          <fpage>W541</fpage>
          -5, (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jonquet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Musen</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>N.H.</given-names>
          </string-name>
          :
          <article-title>Building a biomedical ontology recommender web service</article-title>
          .
          <source>J. Biomed. Sem.;1 Suppl</source>
          <volume>1</volume>
          :
          <fpage>S1</fpage>
          , (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bastian</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heymann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacomy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Gephi: an open source software for exploring and manipulating networks</article-title>
          .
          <source>International AAAI Conference on Weblogs and Social Media</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vincent</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Loup</surname>
            <given-names>Guillaume</given-names>
          </string-name>
          , Renaud Lambiotte,
          <article-title>Etienne Lefebre Fast unfolding of communities in large networks</article-title>
          .
          <source>Journal of Statistical Mechanics: Theory and Experiment</source>
          <volume>10</volume>
          ,
          <issue>P1000</issue>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Freeman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>"A set of measures of centrality based upon betweenness"</article-title>
          .
          <source>Sociometry</source>
          <volume>40</volume>
          :
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          . (
          <year>1977</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>