<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Molecular symmetry and specialization of atomic connectivity by class-based reasoning of chemical structure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michel Dumontier</string-name>
          <email>michel_dumontier@carleton.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Biology, Carleton University</institution>
          ,
          <addr-line>1125 Colonely By Drive, Ottawa, Ontario</addr-line>
          ,
          <country country="CA">Canada</country>
          <addr-line>K1S5B6</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Chemical biology and drug discovery seek to uncover the relationship between chemical structure and function. In the context of the emerging life science semantic web, we have previously investigated multiple strategies for the representation and reasoning of chemical structure, functional groups and chemical attributes using RDF, OWL, SWRL and so-called Description Graphs. Here, we continue our investigation on the representation of molecular structure using class-based approach to infer molecular symmetry and specialization of atomic connectivity. This work provides new design patterns towards representing and reasoning about structured objects.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>OWL</kwd>
        <kwd>chemoinformatics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>chemical,
graph,
representation,
design
pattern,
Chemical biology and drug discovery seek to uncover the relationship between
chemical structure and function. Quantitative structure-activity relationships correlate
so-called descriptors or aspects of chemical structure with activity or reactivity, in the
hopes of identifying other reactive molecules in the absense of experimental results.
Hundreds of so-called descriptors are now being used for QSAR studies, and efforts
have been made to capture these under a common ontology of chemical information
[1]. In this ontology, descriptors can be associated with the structural parts or qualities
that they pertain to [2], thereby enhancing the potential for enrichment analyses over
the emerging life science semantic web [3, 4].</p>
      <p>In the context of building an emerging semantic web for the life sciences, we
previously investigated multiple strategies for the representation of chemical structure
for the purpose of semantic annotation, classification and question answering. While a
class-based representation [5] was used to describe and classify chemicals by their
functional groups (chemical substructures), we were only able to represent molecules
containing cycles as instance data which could be queried using SPARQL or a
rulebased formalism such as SWRL. Such an instance-based description also implied an
inability to compare specific molecules in terms of their structural descriptions. More
recent work [5] investigated the use of Description Graphs as a means to do chemical
classification at the class level, albeit with significant limitations [7].</p>
      <p>In this work, we pursued a class-based axiomatic representation of chemical structure
using OWL, which captures object structures containing cycles. Using automated
reasoning, we infer molecular symmetry and specialization of atomic connectivity, an
important aspect of reasoning over functional groups. This work provides new design
patterns to generate insight into reasoning over structured objects in OWL.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>We considered the six molecules illustrated in Figure 1, as they include linear, forked
and cyclical molecules.</p>
      <p>Non-hydrogen molecular connectivity tables specified in SDF files were converted to
OWL a PHP-based SDF file parser and our PHP-based OWL API. The ontology is
available at http://goo.gl/3qif3. Scripts available on demand. Ontologies were reasoned
about and queried using Protege 4.2 (build 269) with HermiT and FaCT++, and the
built-in explanation workbench.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>3.1</p>
      <sec id="sec-3-1">
        <title>Formalization</title>
        <p>At the core of this work is the formlization of the relations that hold between a
molecule, its atomic parts, and the connectivity between these atoms. We first create a
named entity that exactly represents a fully connected atom, but which is not associated
with any particular molecule. The general pattern includes specifying a qualified
cardinality restriction to the target atom:
`fully connected atom M`
equivalentTo
`atom type`
and `has bond with` exactly 1 `fully connected atom N`
and ...
where `atom type` refers to a specific kind of atom (e.g. carbon atom), `has bond with`
uses a more specific relation (e.g. `has single bond with`, `has double bond with`, `has
triple bond with` and `has aromatic bond with`) and N specifies the target atom. The
pattern connections between M and other atoms are captured here.</p>
        <p>This molecule-independent atomic connectivity is then used as a base description for
specifying molecule-associated atoms. The following equivalent class axiom indicates
the molecule that it is an intrinsic part of and what this atom is necessarily connected to:
`atom X from molecule A`
equivalentTo
`fully connected atom M`
and `is component part of` some `molecule Y`
In addition, we can formalize the definition of a molecule in terms of its fully connected
atomic parts:
`molecule A`
equivalentTo
`molecule`
and `has component part` some `atom X from molecule A`
and ...</p>
        <p>3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Molecular Symmetry</title>
        <p>Given the equivalent class axioms specified above, an OWL reasoner will identify
equivalent atoms if and only if they are connected to exactly the same atom, and no
more. On reasoning, we discover equivalence between atoms 2,3,4 of iso-butane,
atoms 4,5 of iso-pentane and atoms 1, 3 of cyclobutane and 2,4 of cyclobutane. In the
case of isobutane and isopentane, the equivalence occurs in the three terminal single
bonded carbon atoms that are attached to a common atom. For instance, the
explanation of why atoms 2 and 3 of iso-butane are equivalent is provided through the
explanation workbench:
In the case of cyclobutane, we see that atoms 1 and 3 share the same connectivity
while atoms 2 and 4 share the same connectivity. So, while they appear to be equally
connected to carbon atoms, they are not equal in sense of specific connectivity and
nor is the symmetry equal along points of common connectivity. However, despite the
cycle, no equivalence is detected in the cyclohexane molecule.</p>
        <p>3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Molecular Specialization</title>
        <p>Since our representation does not include hydrogen atoms, then certain atoms will
have fewer connected atoms than normal. For instance, terminal methyl carbons
might normally be connected to one carbon and three hydrogens, but in this
representation, there would only be one connection to a carbon. Thus, an atom would
be more specialized than another if it contains all the connections of the parent and
more. In our sampe dataset, we obtain atomic specializations for butane (C3
subclassof C1; C2 subclassof C4), pentane (C3 subclassof C1; C3 subclassof C5)
isopentane (C3 subclass of C1; C2 subclass of C4; C2 subclassof C5). In the case of
where C2 is a subclass of both C4 and C5 in isopentane, the explanation lies in the
fact that both C4 and C5 are only connected to C3, C2 is connected to C3 and C1, as
as such is a more specific kind of C4/C5 atom with respect to its connectivity.
In the case of cyclohexane, no specialization is observed given that there is exact
shared connectivity nor different in the number of connected atoms.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>In this preliminary work, we investigated a representation of molecular structure for the purpose
of class-based reasoning about atomic connectivity. Our design pattern involved the declaration
of a `fully-connected atom` class, which captures the bond connectivity of atoms in a molecule,
in which molecular atoms can be further described. Our representation makes it possible to
analyze structures for the presence of symmetry within acyclic or cyclic molecules, with the
caveat that atoms must have a common connectivity. Moreover, with a subset of common
connectivity, pairs of atoms may exhibit subclass specialization.</p>
      <p>Specified using equivalent class axioms involving exact cardinality restrictions, the
representations falls outside of OWL-EL but within OWL-DL (ALEQ). Given that we only
currently investigate a single molecule at a time, we expect good performance from reasoners
even for significantly larger and more highly connected molecules.</p>
      <p>An ongoing challenge lies in the representation of cyclic structures or so-called structured
objects. While our representation is able to describe cyclic structures at the class-level, we note
that the representation is incomplete and would results in unintended models. Problematically,
some models would only consist of infinite chains of carbon atoms. Thus, it’s trivial for 4
carbon butane to satisfy a DL query asking for 10 connected carbon atoms. The system is highly
underconstrained and may produce erroneous answers to queries.</p>
      <p>Despite these difficulties, a key goal remains the inference of equivalent atoms in a cyclic
structure such as cyclobutane and cyclohexane. An alternative to the approach taken here may
lie in the generation of identifiers (URIs) for atoms based solely on their descriptions, and has
been described in our prior work [6]. We expect that the combination of unique structural
identifiers with the molecule-free atomic descriptions may be sufficient to infer equivalent
atoms and identify structurally equivalent molecules.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We would like to thank the anonymous reviewers for providing valuable feedback which
improved the quality of this manuscript. Reseach and travel is supported by a NSERC
Discovery Grant.
6</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <article-title>The chemical information ontology: provenance and disambiguation for chemical data on the biological semantic web</article-title>
          .
          <source>PLoS One</source>
          ,
          <year>2011</year>
          .
          <volume>6</volume>
          (
          <issue>10</issue>
          ): p.
          <fpage>e25513</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Konyk</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Leon</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
          </string-name>
          .
          <article-title>Chemical Knowledge for the Semantic Web</article-title>
          .
          <source>in DILS</source>
          .
          <year>2008</year>
          . Evry, France: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>M.A. Nolin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Belleau</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Corbeil</surname>
          </string-name>
          ,
          <article-title>Building an HIV data mashup using Bio2RDF</article-title>
          . Briefings in Bioinformatics,
          <year>2012</year>
          .
          <volume>13</volume>
          (
          <issue>1</issue>
          ): p.
          <fpage>98</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.J.</given-names>
            <surname>Wild</surname>
          </string-name>
          ,
          <article-title>Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data</article-title>
          .
          <source>BMC bioinformatics</source>
          ,
          <year>2010</year>
          .
          <volume>11</volume>
          : p.
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Stevens</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Horne</surname>
            , and
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Britz</surname>
          </string-name>
          ,
          <article-title>Representing chemicals using OWL, description graphs and rules</article-title>
          , in 7th International Workshop of OWL: Experiences and Directions: San Francisco, California, USA. p.
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Chepelev</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <article-title>Increasingly accurate biochemical knowledge representation with precise, structure-based chemical identifiers</article-title>
          .,
          <source>in Proceedings of the International Workshop on Bio-ontologies</source>
          .
          <year>2009</year>
          : Stockholm, Sweden.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>