<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tawny-SBOL: Using ontologies to design and constrain genetic circuits</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Goksel Misirli</string-name>
          <email>g.misirli@keele.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Phillip Lord</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing Science, Newcastle University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing and Mathematics, Keele University</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Synthetic biology is a data-driven engineering discipline and designing novel genetic circuits often requires utilizing existing information where possible. Semantic Web technologies, and particularly ontologies, are important to formalize knowledge for computational design processes and to facilitate data interoperability. The Synthetic Biology Open Language has already emerged as a data standard and is based on RDF/XML. This language is ideal to represent information as graphs in which nodes and edges are defined using multiple properties. Terms from ontologies and controlled vocabularies are used to indicate the meaning of these multiple properties. Semantic representation of these nodes and edges would simplify both the representation of information and the querying of underlying information. Here, we present Tawny-SBOL as a domain specific language and a framework to address these issues. Tawny-SBOL is a proof-of-concept project, based on the Tawny-OWL ontology library, to specify genetic circuit designs. Users can query and potentially constrain these designs. As a result, designs can be evolved based on predefined requirements. Due to the native Clojure language support, users can extend Tawny-SBOL programmatically and work interactively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        The Synthetic Biology Open Language (SBOL)
        <xref ref-type="bibr" rid="ref1">(Bartley et al.,
2015)</xref>
        has been developed to computationally exchange information
about genetic circuits. Using this language, complex genetic
circuits can be defined in terms of constituting simpler components
such as DNA, proteins and small molecules. Designs can be
hierarchical, formed of many sub designs, and the querying of the
underlying information becomes challenging due to the complexity
of relationships between different components. Each component
may have additional properties such as the intended biological role,
its molecular composition and so on.
      </p>
      <p>These details are encoded using RDF and it can be difficult
to construct SBOL documents manually. Although, there are
discussions to adopt the Turtle format in the future, RDF/XML
is currently adopted and the utilisation of existing Semantic
Web tooling is particularly valuable. There are already ongoing
developments to create SBOL APIs which are available in Java,
C, Python and JavaScript languages. Although these APIs are
necessary to create SBOL documents, detailed knowledge about the
SBOL data model and how each SBOL entity is related to others is
required. These APIs can be used by experienced programmers who
are expert in using the programming language for their chosen API
and these programmers usually follow the development of SBOL
closely.
Ideally, biologists should use tools built upon these APIs. Clearly,
interacting with different tools takes time and effort to learn.
Simplified textual representation is another way of sketching genetic
circuit designs and improving them later on. Moreover, decoupling
the connection between APIs and complex data structures may
further facilitate the development of useful tools.</p>
      <p>
        ShortBOL
        <xref ref-type="bibr" rid="ref4">(Pocock et al., 2016)</xref>
        has particularly been developed
as a shorthand language to produce complex SBOL documents
more easily. It is a human-readable textual language and allows
defining design components and their composition. The mechanism
behind ShortBOL is template expansion, in which templates can
be hierarchical and each template adds additional graph attributes
until fully-serialized SBOL RDF graphs are created. For example,
in SBOL to represent a promoter component, it needs to be
declared as a ComponentDefinition with the sbol:type
of biopax:DnaRegion, and the sbol:role of SO:0000167
(The Sequence Ontology promoter term). In Shorthand, a promoter
is already defined as a template, and the SBOL Shorthand compiler
utilises this template to inject required RDF triples.
      </p>
      <p>In this work, we present Tawny-SBOL, based on the promising
development of ShortBOL, and utilise ontologies to provide the
meaning of design entities through subsumption. This domain
specific language (DSL) based ontological representation allows
executing simple semantic queries that can be quite complex when
represented as a graph query. Moreover, as opposed to creating static
documents that can be exchanged between researchers, our aim is to
provide an interactive design environment, where users can create
semantic constraints and queries, and designs can evolve over time.
2</p>
    </sec>
    <sec id="sec-2">
      <title>THE SBOL ONTOLOGY</title>
      <p>
        Standardised SBOL terms to describe the SBOL data model already
exist. However, these terms are part of a controlled vocabulary,
which is embedded in the SBOL specification documents using
free text. In order to utilise ontological representation of SBOL
documents, we created the SBOL ontology using Tawny-OWL
        <xref ref-type="bibr" rid="ref2">(Lord, 2013)</xref>
        programmatically (Figure 1). We defined classes for
SBOL entities that are represented as RDF resources. Some of
the SBOL entities are not serialised but act as interfaces to group
others. In this work, super classes have been defined to represent
these interface entities. Moreover, SBOL specific terms that are only
referenced to uniquely identify features of SBOL entities have been
represented as classes. These include classes to indicate Access,
Direction and Refinement types in SBOL.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>TAWNY-SBOL</title>
      <p>Tawny-SBOL provides a simple DSL to create SBOL data. It is
implemented using Clojure and therefore inherits the properties of
this language, such as using parentheses to create a block of SBOL
data or to perform specific actions such as saving the results and so
on. A specific Tawny-SBOL keyword is used to indicate the type of
a simple biological component such as promoter, coding sequence,
ribosome binding site, terminator and so on. The complex designs
formed of simple components are represented using the design
command. This command takes a parameter specified according to
a grammar (Figure 2).</p>
      <p>The resulting files not only include SBOL specific information,
but also additional classes that facilitate executing semantic
reasoners. These classes are injected by Tawny-SBOL. Currently,
queries can be written using the OWL syntax and can directly be
executed using the Tawny-OWL framework. For example, the first
query in Figure 3 lists promoter resources, which are represented
using SBOL’s ComponentDefinition entity and has the role
of SO:0000167 term. In the second query, all the parents of the
lac1 component is queried. The query in this case is an ontology
class named lac1Parent and is used to recursively find all the
uses of the child component in parent designs. In the future, we will
further simplify the querying process by introducing SBOL specific
commands in Tawny-SBOL.
)
...
(cds "lacI"
{name "lacI",
description "lacI coding sequence",
designedBy "..."
}
(design "lacI_expression prom1 1..40:+ rbs1 41..50
:+ lacI 51..800:+ term1 801..850:+")
(save "lacI_expression")</p>
      <p>ComponentDefinition and (role some SO:0000167)
ComponentDefinition and ((component some lac1) or
(component some lac1Parent))</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>
        Ontologies can be extremely useful to capture domain knowledge
and to execute logical queries in synthetic biology
        <xref ref-type="bibr" rid="ref3">(Misirli et al.,
2016)</xref>
        . Tawny-SBOL has been developed to exploit these features
for the ontological representation of genetic circuit designs. Here,
we introduced the SBOL ontology together with a human readable
textual DSL for SBOL. This DSL is based on Tawny-OWL and
the Clojure programming language, providing users an extensible
and interactive environment to add new design information when
it is available, to query design information, and to create logical
constrains. As the design-build-test cycle of engineering biological
systems can take several iterations and can be achieved in long
timescales, this constraint based approach will help to achieve
desired systems and also to evolve designs in a controlled manner.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bartley</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beal</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clancy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misirli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roehner</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oberortner</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pocock</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bissell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madsen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gennari</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wipat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Synthetic Biology Open Language (SBOL) Version 2.0.0</article-title>
          .
          <source>Journal of Integrative Bioinformatics</source>
          ,
          <volume>12</volume>
          (
          <issue>2</issue>
          ),
          <fpage>272</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Lord</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>The semantic web takes wing: Programming ontologies with tawnyowl</article-title>
          .
          <source>arXiv preprint arXiv:1303</source>
          .
          <fpage>0213</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Misirli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hallinan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pocock</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lord</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McLaughlin</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wipat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Data integration and mining for synthetic biology design</article-title>
          .
          <source>ACS Synthetic Biolology</source>
          ,
          <volume>5</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1086</fpage>
          -
          <lpage>1097</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Pocock</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , C.,
          <string-name>
            <surname>McLaughlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misirli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wipat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Shortbol: A shorthand for sbol</article-title>
          .
          <source>In 8th International Workshop on Bio-Design Automation.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>