<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GReg : a domain specific language for the modeling of genetic regulatory mechanisms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicolas Sedlmajer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Didier Buchs</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steve Hostettler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alban Linard</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edmundo Lopez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Marechal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université de Genève</institution>
          ,
          <addr-line>7 route de Drize, 1227 Carouge</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <volume>724</volume>
      <fpage>21</fpage>
      <lpage>35</lpage>
      <abstract>
        <p>Chemical and biological systems have similarities with ITsystems as they can be observed as sequences of events. Most available tools propose simulation frameworks to explore biological pathways (i.e., sequences of events). Simulation only explores a few of the most probable pathways in the system. On the contrary, techniques such as model checking, coming from IT-systems analysis, explore all the possible behaviors of the modeled systems, thus helping to identify interesting pathways. A main drawback from most model checking tools in the life sciences domain is that they take as input a language designed for computer scientists, that is not easily understood by non-expert users. We propose in this article an approach based on Domain Specific Languages. It provides a comprehensible language to describe the system while allowing the use of complex and powerful underlying model checking techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Because of their stochastic and combinatorial nature, many biological systems
such as cellular and supra-cellular interactions are very hard to investigate.
Current practice is mainly limited to the use of in vivo and in vitro experiments.
Investigation through formal models of biological systems is currently a rather
restricted research field, unlike what has been done in other natural sciences
such as chemistry and physics. There is clearly an emerging field of research
where future experiments can be partially performed in silico, i.e., by means
of techniques from computer science. One of the main approaches of biological
modeling is the so-called regulatory networks [
        <xref ref-type="bibr" rid="ref17 ref5">17,5</xref>
        ]. The main idea of biological
modeling according to the regulatory network approach is to model
interbiological reactions through a set of interdependent biological rules. This can be seen as
a set of discrete modules having strong interconnections. The occurrence of
interesting events in the biological system can be represented as logical properties
expressed on the state of these modules. This is very similar to the kind of
properties computer scientists validate on hardware and software systems (deadlocks,
error states,. . . ).
      </p>
      <p>Among the tools available in this domain, the main analysis approach for
regulatory networks is simulation. Simulation is generating and analyzing a limited
sample of possible system behaviors. This technique is not convenient when the
main purpose of the research is to look for rare or abnormal behaviors (e.g.,
cancer). The main approach in this case is to use model checking instead of
simulation. Model checking consists in generating and analyzing the complete set of
possible states of the system. Naturally, this technique suffers from the drawback
of the enormous number of possible states of biological systems.</p>
      <p>
        It is interesting to note that this problem is well-known to the model checking
community in computer science, where it is called the state space explosion [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
There is a parallel between cellular interactions and software systems in that the
state space explosion is mainly due to their concurrent nature. Therefore, we can
apply techniques that have been developed for the model checking of hardware
and software systems to biological interactions. Approaches based on a symbolic
encoding of the state space are particularly well-suited for this [
        <xref ref-type="bibr" rid="ref4 ref9">4,9</xref>
        ].
      </p>
      <p>In this paper we show a work in progress in our group. We present Gene
Regulation Language (GReg), our first attempt to build a framework for
modeling and analyzing biological systems based on formal modeling and reasoning.
Advanced techniques for defining Domain Specific Languages, giving their
semantics and analyzing them using symbolic model checking are presented. First,
Section 2 describes precisely the biological domain considered by GReg, then
Section 3 outlines the state of the current research in the field and Section 4
describes in detail the creation and usage of Domain Specific Languages. The
following two chapters describe GReg itself. Section 5 describes the language
designed for expressing biological mechanisms and the corresponding queries, and
Section 6 provides a simple example taken from the literature. Finally, Section 7
concludes the article and discusses the future research perspectives in this area.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Chemical and biological models covered by GReg</title>
      <p>tra
nscriptio</p>
      <p>n
DNA
splicing</p>
      <p>translation
pre-mRNA</p>
      <p>mRNA</p>
      <p>The purpose of GReg is to describe genetic regulatory mechanisms
controlling the DNA to protein process (Figure 1). This process comprises three steps:
transcription, splicing and translation. The regulation of each step can be
modeled using standard chemical reactions, presented in Section 2.1. For the
transcription initiation we use the genetic regulatory mechanism model, presented
in Section 2.2. Finally we define a cell network in Section 2.3. We separate these
three domain models to clearly distinguish chemical from biological concepts.</p>
      <p>
        We use the same definition of molecule level as presented in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], if a molecule
has n distinct actions, we define (n + 1) levels. This allows us to use a discrete
formalism to efficiently model the gene expression [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which is in fact the
concentration of the gene products. The lowest level is the lowest transcription
rate of a gene.
2.1
      </p>
      <p>Chemical compartment model
A chemical compartment = hM ; Ri 2 K , is composed of a non-empty set of
molecules (? 6= M M olecules) and a set of reactions (R Reactions). Two
compartments may be separated by a membrane (i.e., selective barrier), thus
allowing molecule transfer between them.</p>
      <p>A chemical reaction = hRe; Pr ; Ca; k; type i 2 R, where Re; Pr ; Ca
M(M ) K , the reactants (Re), products (Pr ) and catalysts (Ca) are defined as
the association of a multiset of one molecule (M(M )) with a given compartment
( 2 K ).</p>
      <p>Catalysts (Ca) have the particularity to appear as reactants and products
in a chemical reaction. We have chosen to define the catalysts in a distinct set,
therefore no catalyst can appear as reactant or product:
8m 2 M ; m 2 Ca =) m 2= Re ^ m 2= Pr .</p>
      <p>A reaction may be either irreversible or reversible, type 2 firr; revg. In
reversible reactions an equilibrium constant (k) is defined with the ratio of both
direction rates.
2.2</p>
      <p>Genetic regulatory mechanism model
A genetic regulatory mechanism = h ; C i, is composed of a non-empty set of
genes ( ). Genes are organized into one or more chromosomes (C ). A genetic
regulatory mechanism is contained in a chemical compartment, this specification
will be presented in Section 2.3.</p>
      <p>A chromosome c = 1; : : : n 2 C is a sequence of loci. A gene may have
different version (i.e., alleles) at a given chromosome location (i.e., locus). And a
locus = h i defines a non-empty set of genes (? 6= ) that are located
at a given locus. Then the set of all possible chromosomes in a mechanism is the
Cartesian product of the set of genes at each locus:
Chromosomes = 1 n .</p>
      <p>A gene = hM ; i 2 is a portion of DNA that codes for at least one
molecule (? 6= M M ), and may contain some regulation sites ( Sites).</p>
      <p>A regulation site = hM ; type i 2 defines the non-empty set of
regulatory molecules (? 6= M M ) associated to the regulation site of a given
gene. Note that when using anti-termination sites, genes order matters, therefore
chromosomes must be defined.</p>
      <p>I1 O1</p>
      <p>In On
start</p>
      <p>A1 T1</p>
      <p>An Tn
stop
promoter region</p>
      <p>transcribed region</p>
      <p>Our idealized gene structure (Figure 2) is composed of two regions: promoter
and transcribed. Note that the exact position in the gene of each regulatory site
is not specified, i.e., we are mainly interested in its regulatory role. The type of
a regulation site is type 2 fI;O;A;Tg.</p>
      <p>Initiation (I) the portion of DNA where bound activators increase the rate of
the transcription process. It is located in the promoter region;
Operator (O) the portion of DNA where bound repressors block the
transcription process. It is located in the promoter region;
Anti-termination (A) the portion of DNA where activators continue the
transcription process. It is located in the transcribed region. These sites may also
allow the transcription of the next gene;
Termination (T) (also called attenuator) the portion of DNA where repressors
stop the transcription process. It is located in the transcribed region; thus,
it produces a reduced RNA.
2.3</p>
      <p>Cell network
The cell network model is designed to model the interactions of different cells
with their environment and also with their inner components (i.e., organelles).
This model authorizes the construction of currently not observed cells, e.g.,
prokaryote with nucleus, eukaryote with multiple nuclei, etc. The validity of the
specification is delegated to the user (i.e., domain expert).</p>
      <p>This model defines three chemical compartments, therefore they inherit both
sets M and R.</p>
      <p>Organelle, ! = hM ; R; i is the lowest compartment in the compartment
hierarchy. An organelle may contain a mechanism ( ), e.g., nucleus,
mitochondrial DNA, etc.</p>
      <p>Cell, = hM ; R; ; i contains a possibly empty set of organelles ( ). The
model of a prokaryote cell would define a mechanism ( ), by cons an
eukaryote cell would define instead an organelle with a mechanism (i.e., nucleus).
Network, = hM ; R; i represents the environment and contains one or more
cells ( ).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>In this section we compare three well-known tools with our approach. We first
define a few criteria such as the kind of analysis, the supported formalism and
the supported exchange format. Table 1 presents a summary of the resulting
comparison.</p>
      <p>
        Domain language To be productive, the syntax of the input language should
be as close as possible to the actual domain of the user. This input
language can be textual (like in tools that use Systems Biology Markup
Language (SBML) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) or graphical (like Systems Biology Graphical Notation
(SBGN) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
Simulation &amp; Model Checking Although there are many tools adapted to
biological process design and simulation, only a few of them allow exhaustive
exploration of the state space. While simulation is very useful during model
elaboration, an exhaustive search may help to discover pathological cases
that would have never been explored by simulation.
      </p>
      <p>
        Discrete &amp; continuous Continuous models are closer to the real biological
systems than discrete models, but unlike the latter they are not adapted for
model checking techniques. Discrete formalisms allow a complete exploration
of the state space while preserving the qualitative properties of the system,
as mentioned in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Exchange format The supported interchange format is an important feature
as it allows us to bridge the gap between different tools and therefore enables
the user to use the most adapted tool to hand. SBML is a common
interchange format based on XML. It is used to describe biochemical reactions,
gene regulation and many other topics.</p>
      <p>
        Cell Illustrator [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is an example of a commercial simulation tool for continuous
and discrete domains. The graphical formalism is based on PN, called Hybrid
Functional Petri Nets with extensions (HFPNe), which adds the notions of
continuous and generic processes and quantities [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The XML-based exchange file
format used in Cell Illustrator is called CSML.
      </p>
      <p>
        Gene Interaction Network simulation a.k.a. GinSim [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is a tool for the
modeling and simulation of genetic regulatory networks. It models genetic regulatory
networks based on a discrete formalism [
        <xref ref-type="bibr" rid="ref13 ref6">6,13</xref>
        ]. These models are stored using
the XML-based format ginml. The simulation computes a state transition graph
representing the dynamical behavior network. GINsim uses a graphical Domain
Specific Language (DSL) called Logical Regulatory Graph (LRG) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Models
in LRG are graphs, where nodes are regulatory components (i.e., molecules and
genes) and arcs are regulatory interactions (i.e., activation and repression)
between the nodes.
      </p>
      <p>
        Cytoscape [
        <xref ref-type="bibr" rid="ref15 ref7">7,15</xref>
        ] is an open source software platform for visualizing complex
networks and integrating these with any type of attribute data. Cytoscape supports
many file formats including PSI-MI and SBML. It has the advantage of adding
features through a plug-in system. Many plug-ins are available for various
domains such as biology, bioinformatics, social network analysis and the semantic
web. Over 100 plug-ins are listed on the official website.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>DSL approach</title>
      <p>Model checking involves verifying whether a property holds on the whole set of
possible states of a given model. To generate this complete state space, the model
must be expressed in a formal language intended for this operation, like Petri
Nets (PNs). This makes the model checking approach impractical for people who
do not master these formal languages. We propose using DSLs for this purpose.</p>
      <sec id="sec-4-1">
        <title>Tool</title>
      </sec>
      <sec id="sec-4-2">
        <title>Domain language</title>
      </sec>
      <sec id="sec-4-3">
        <title>Simulation</title>
      </sec>
      <sec id="sec-4-4">
        <title>State space</title>
      </sec>
      <sec id="sec-4-5">
        <title>Model checking</title>
      </sec>
      <sec id="sec-4-6">
        <title>Discrete</title>
      </sec>
      <sec id="sec-4-7">
        <title>Continuous</title>
      </sec>
      <sec id="sec-4-8">
        <title>Exchange format</title>
        <p>Cell Illustrator
3
3
5
5
3
3</p>
      </sec>
      <sec id="sec-4-9">
        <title>CSML</title>
        <p>A DSL is a programming or specification language tailored for a given
domain; it presents a reduced set of instructions closely related to this domain.
Using a DSL has two main objectives. First, learning the language should be
easy for someone with enough knowledge about the domain, even if this person
does not have previous knowledge of other languages. Second, the number of
errors made by a novice user should be drastically reduced as the expressivity
of the language is reduced to the minimum.</p>
        <p>The DSL semantics are defined by transformation into a target language,
which is a formal language where complex operations (like model checking) can
be performed. Usually, the scope of the target language is broader than the scope
of the DSL. This allows using the same target language and its associated tools
for different DSLs. Moreover, while creating a new DSL, it is often possible to use
an already-existing language as a target, thus facilitating the language creation
process. The results obtained in the target language are translated again into
the DSL and returned to the user. This process is described in Figure 3. After
the creation of the initial model, all the following steps must be fully automatic,
to hide the underlying complexity from the end user.</p>
        <p>Manual
intervention
Automatic
processing</p>
        <p>Model of the system
Domain Specific</p>
        <p>Language</p>
        <p>T
r
a
n
s
fr
o
m
a
it
o
n
Model of the system</p>
        <p>Formal language</p>
        <p>Domain
expert
Computation tool</p>
        <p>Counter-example
Domain Specific</p>
        <p>Language
n
o
it
a
m
fr
o
s
n
a
r</p>
        <p>T
Counter-example
Formal language
Language
engineer</p>
        <p>Transform.</p>
        <p>The person in charge of the DSL creation is a language engineer. This person
should obviously have a certain knowledge about the language creation process,
but he should also master the target platform language, in order to define efficient
and correct transformations. Furthermore, he should be in contact with at least
one domain expert, in order to settle the requirements and verify the correctness
and completeness of the language created. The creation of a DSL follows a set of
specific steps. First, the language engineer must identify the abstract concepts
of the domain. These concepts include the basic elements of the domain, the
interactions between these elements and the precise boundaries of the domain
considered. Based on these concepts, the language engineer must define a set of
expressions used to create a specific model in the domain, i.e., a concrete syntax.
Finally, the language engineer must define the semantics of the language created,
usually by transformation to an existing platform. The whole process must be
validated by one or more domain experts. Domain experts must validate the
three steps of the DSL creation process: the domain must have been correctly
defined, the expressions of the concrete syntax must be close to the already
existing languages in the domain, and the execution must return the expected
results. This creation/validation process often leads to an iterative development
of the language. We show this entire process in Figure 4.</p>
        <p>
          There exist various DSLs tailored for the biological processes, e.g., SBML [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
and SBGN [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. These two well-known languages cover a wide range of systems,
mainly in the bio-chemical domain. GReg, instead, focuses on a more specific
domain, which is genetic regulatory mechanisms. This domain has been described
in Section 2.
        </p>
        <p>
          While creating GReg, we used the Eclipse Modeling Project (EMP)[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
approach. We first created a metamodel of the domain using the Eclipse
Modeling Framework (EMF), and we defined a concrete syntax with XText. XText
provides a set of tools to create an editor for a given language, with some
userfriendly features such as syntax highlighting, on the fly syntax checking (see
Figure 5) and auto-completion. As a target platform we chose AlPiNA. AlPiNA[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
is a model checking tool for Algebraic Petri Nets (APNs). It aims to perform
efficient model checking on models with extremely large state spaces, using
Decision Diagrams (DDs) to tackle the state explosion problem. AlPiNA’s input
languages were also defined using the EMP approach. This allowed us to use
Atlas Transformation Language (ATL) transformations, which is a tool dedicated
to define model to model transformations. GReg is thus fully integrated in the
Eclipse/EMP framework.
        </p>
        <p>The modular structure of GReg’s definition would allow us to replace the
target domain while keeping exactly the same language. If needed, we could, for
example, define a transformation to SBML using a model to text transformation
tool like XPand.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>GReg : Gene Regulation Language</title>
      <p>
        GReg is a Domain Specific Language designed to describe genetic regulatory
mechanisms. We built it in order to illustrate the DSL approach, and the benefits
it provides to research in the life sciences domain. Throughout this section, we
introduce the GReg language using an excerpt of the lac operon model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We
also introduce the GReg Query Language (GQL) language, used to specify the
queries to be executed in the model specified in GReg.
      </p>
      <p>We first show how to describe a regulation mechanism. Listing 1 shows the
overall structure of a GReg mechanism specification. The mechanism is named
(lac_operon). It specifies the molecules occurring in the mechanism, and the
chemical reactions between these molecules. The GReg description also specifies
the genes with their properties and organization into chromosomes.
mechanism lac_operon i s
molecules</p>
      <p>d e c l a r a t i o n o f m o l e c u l e s
reactions</p>
      <p>d e c l a r a t i o n o f c h e m i c a l r e a c t i o n s
chromosomes</p>
      <p>d e c l a r a t i o n o f chromosomes
gene d e c l a r a t i o n o f a g e n e</p>
      <p>d e c l a r a t i o n o f o t h e r g e n e s
end lac_operon</p>
      <p>Listing 1: GReg mechanism specification</p>
      <p>The molecules section of a GReg mechanism lac_operon i s
description specifies the molecules oc- molecules
curring in the mechanism. For in- lactose , allolactose ,
stance, in Listing 2, the lac_operon lacI , lacZ , lacY , lacA ,
mechanism uses molecules lactose, cAMP , CAP
allolactose, lacI, lacZ, etc. ...</p>
      <p>Molecules are only described by end lac_operon
their names, as it is the only infor- Listing 2: Molecules declaration
mation relevant in our language. The DSL approach emphasizes specification of
only the required information for the particular domain. No molecules other than
the ones described here can be used in the mechanism. This constraint is useful
for the user creating a GReg specification: spelling errors in molecule names are
detected, see Figure 5.</p>
      <p>After the molecules declaration, a GReg description specifies the chemical
reactions that take part in the mechanism. Listing 3 presents the reactions part
of the mechanism.</p>
      <p>mechanism l a c _ o p e r o n i s
...
r e a c t i o n s
in du ct io n : lacI + a l l o l a c t o s e ! _
allo : lactose ! a l l o l a c t o s e cat lacZ
...
end l a c _ o p e r o n</p>
      <sec id="sec-5-1">
        <title>Listing 3: Reactions declaration</title>
        <p>In GReg, each reaction has a name, for instance induction. As usual in chemical
notations, a reaction is a relation among (weighted) molecules. The molecule
weight is usually called the stoichiometric coefficient. By default, stoichiometric
coefficients are valued 1.</p>
        <p>A reaction can be either irreversible ( ! ) or reversible ( $ ). In our example,
the reaction induction is a degradation of lacI and allolactose. A degradation
is an irreversible reaction where products are not interesting, thus the reactants
are simply removed from the system.</p>
        <p>If needed, each direction of the reaction can be given a reaction rate (i.e.,
probability). Each reaction can also have catalysts specified using the keyword cat.
For instance allo is a reaction catalyzed by lacZ.</p>
        <p>Listing 4 presents the genes
specification. For instance, rep is a
minimal gene (i.e., not regulated). A
minimal gene defines at least the molecules
it codes. If they are relevant,
regulation sites are also specified in a section
with the sites keyword. The lac gene
defines a regulated gene with two
regulation sites I and O , together with
the molecule acting on them. Note
that GReg also allows us to define
several regulation sites for I, O, A, T.
mechanism l a c _ o p e r o n i s
...
gene rep</p>
        <p>codes lacI
end rep
gene lac
codes lacZ , lacY , lacA
s i t e s</p>
        <p>I : cAMP and CAP = 1</p>
        <p>O : lacI @ 2
end lac
end l a c _ o p e r o n</p>
      </sec>
      <sec id="sec-5-2">
        <title>Listing 4: Genes declaration</title>
        <p>As molecules may be present at different levels, the @ keyword allows the
specification of the required molecules levels acting at a regulation site. By default,
molecule levels are valued 1. As several molecules can act on one regulation site,
GReg allows to combine molecules with Boolean operators for a regulation site.
There are two operators defined : and and or. For instance the I site of lac gene
specifies that cAMP and CAP are required to activate this site. The = keyword
allows to specify the target level attributed to the gene once this site is active.</p>
        <p>Note that for A sites, it is also possible to specify the next gene target level.
As the role of a T site is to interrupt the transcription process, we allow the
specification of the reduced set of produced molecules when these sites are active.</p>
        <p>The chromosomes section is used
to specify one or more chromosomes.</p>
        <p>A chromosome defines the sequence of
loci. A locus is defined between two
braces. Note that genes’order in each
locus does not matter. This section is
mandatory when taking into account
A sites.
mechanism la c_ ope ro n i s
...
chromosomes</p>
        <p>c : { rep } , { lac , lac ’}
...
end la c_ ope ro n</p>
      </sec>
      <sec id="sec-5-3">
        <title>Listing 5: Chromosomes declaration</title>
        <p>Listing 6 shows an example of GQL specification. The use keyword
imports the lac_operon mechanism from another file, allowing us to reference the
molecules declared in lac_operon mechanism from a query specification. A GQL
file specifies the levels and the queries.</p>
        <p>The levels section is used to de- use " la c_o pe ro n . greg "
fine the combination of levels. A com- l e v e l s
bination of levels is a partial or to- l1 : lacZ = 1
tal definition of molecule levels, while l2 : lacZ = 0 , lacI = 2
unspecified molecules may match any queries
possible value. For instance l1 speci- bool a : e x i s t s l1
fies only the level of lacZ among all ppaatthhss bc :: ppaatthhss ll11 , ..l2 l2
molecules defined in lac_operon. The
exists query returns true if predefined Listing 6: GQL queries specification
level exists. The paths query is used to retrieve all paths from the state space
matching the sequence of predefined levels. The query b returns the path where
l2 is a direct successor of l1. But query c does not require that l2 is a direct
successor of l1.</p>
        <p>Listing 7 shows an example of use " la c_o pe ro n . greg "
a GReg configuration specification use " la c_o pe ro n . gql "
given outside the mechanism, usually i n i t i a l l y la c_ ope ro n has
in a separate file. This allows to eas- lactose = 1
ily repeat experiments for the same lacI = 1
mechanism with several initial quan- execute
tities. The first section starts with i f a then ( b and c )
the initially keyword and defines the Listing 7: GReg specification
genes or molecules initial levels. The second section starts with the execute
keyword and is used to specify which queries will be executed by the model
checker.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Example</title>
      <p>
        We present a simple example of a genetic regulatory mechanism taken from [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
with three genes. Gene Y is activated by the product of gene X. Genes X and
Z are repressed by the products of genes Z and Y respectively. The products
of genes X, Y and Z are molecules x, y and z respectively. Graphical (LRG)
and textual (GReg) models derived from this example are given at Figure 6 and
Listing 8 respectively.
      </p>
      <p>X</p>
      <p>Y
Z
mechanism example i s
molecules x , y , z
gene X codes x</p>
      <p>s i t e s O : z
end X
gene Y codes y</p>
      <p>s i t e s I : x
end Y
gene Z codes z</p>
      <p>s i t e s O : y
end Z
end example</p>
      <p>From theses models AlPiNA is able to compute the state space and to identify
all deadlock without requiring any additional input from the user.</p>
      <p>A deadlock is a situation where no more events can occur in the system.
Strictly speaking, in real biological systems there are no deadlocks but livelocks,
a situation where events still occur but without changing the state of the
system. The states where such situations occur are usually called stable states or
attractors.</p>
      <p>But we can also search for specific properties of the biological system. As
previously mentioned, the state space might contain too many states to be used as
it is. Therefore we propose a way to extract portions of state space (e.g., subsets
Xpre</p>
      <p>Xabs
X
0
x
Xact
Xrep
Xact</p>
      <p>Yrep
Zpre
Xabs</p>
      <p>Yact
Zact
Zrep
Zact
of states, paths, cycles, ...) through GQL queries. In Listing 9, we have defined
one level (l1) and one query (at).</p>
      <p>s1 : e x i s t s ( $x in x , $y in y , $z in z :
(( $x equals suc ( zero )) and ( $y equals suc ( zero ))) = false</p>
      <p>Listing 10: AlPiNA property expression of query in Listing 9</p>
      <p>The example in Figure 6 and Listing 8 has eight states reachable from the
initial marking, where all molecules are initially at level zero (x,y,z) = (0,0,0),
see Figure 9. Model checking of the example PN finds two stable states: (1,1,0)
and (0,0,1) and returns for s1 the two states: (1,1,0) and (1,1,1).
x = 1
y = 1
z = 0
x = 1
y = 1
z = 1</p>
      <p>Xact
Xrep
x = 0
y = 1
z = 0
Zrep
x = 0
y = 1
z = 1</p>
      <p>Legend</p>
      <sec id="sec-6-1">
        <title>Initial state</title>
      </sec>
      <sec id="sec-6-2">
        <title>Stable state</title>
        <p>Yrep
Yact
Yact
Yrep</p>
        <p>Fig. 9: State space of example in Figure 6 and Listing 8
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion &amp; future work</title>
      <p>This paper introduces Gene Regulation Language (GReg), a language dedicated
to the modeling of regulatory mechanisms. We explain the need to explore
completely the sets of possible behaviors of a given model in order to detect rare
events. GReg includes a query language used to express the properties of such
events.</p>
      <p>From a technical point of view, we explain that the techniques proposed
are based on general principles borrowed from software modeling and
verification. These techniques include the use of a dedicated DSL defined with a
metamodeling approach and the translation of this language into a formal verification
platform called AlPiNA.</p>
      <p>The languages and transformations shown in this article have been
implemented and tested on several toy examples. We also asked biologists to assess
the expressivity and usability of the language. Although the first feedback seems
promising, there is much room for improvement. We foresee three main axes of
future development: improving the expressivity of the modeling and query
languages, assessing the usability of the approach and exploring the mitigation of
the state space explosion.</p>
      <p>Extending the expressivity of GReg Concepts such as time and
probabilities play an important role in biology and are therefore good candidates
for a language extension. As the current underlying formalism, APNs, does
not support these notions, such extension would require changing the
target platform. Good examples of target formalisms are timed Petri nets and
stochastic Petri nets. As mentioned in 4, the techniques used to create GReg
allow changing the target language without changing the language itself.
Note that we do not plan to add continuous concepts, used in languages
such as HFPNe.</p>
      <p>Improving the usability of our tool Textual domain specific languages
constitute a first step towards democratization of formal methods. Although
highly efficient, textual languages are usually not as intuitive as graphical
languages. On the other hand, graphical domain specific languages are
especially good in the early phase of the modeling as well as for documentation,
but they are often less practical when the model grows. The tools in EMP
that were used to create GReg allow us to define a graphical version of the
same language, thus keeping the best of both worlds.</p>
      <p>Another way to ease the modeling phase is to allow import/export of models
from/to other formalisms and standards such as SBML and to integrate it
with Cytoscape through its plug-in mechanism.</p>
      <p>
        Mitigating the state space explosion So far, we have done little
experimentation in this area for biological processes. Nevertheless, we conducted several
studies on usual IT protocols and software models that show that AlPiNA
can handle huge state spaces[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This suggests promising results in the
regulatory mechanisms domain.
      </p>
      <p>The development of GReg is a work in progress, we would like to set up more
collaborations with biologists interested in exploiting formal techniques from
computer science to discover rare events. We think that we can make, in the
near future, a useful contribution to life sciences based on advanced techniques
borrowed from computer science.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Eclipse</given-names>
            <surname>Modeling</surname>
          </string-name>
          <article-title>Project</article-title>
          . http://www.eclipse.org/modeling/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Buchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hostettler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marechal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Risoldi. AlPiNA: A Symbolic Model</surname>
          </string-name>
          <article-title>Checker</article-title>
          . In Petri Nets'
          <year>2010</year>
          :
          <article-title>Applications and Theory of Petri Nets</article-title>
          , Braga, Portugal, volume
          <volume>6128</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>287</fpage>
          -
          <lpage>296</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Buchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hostettler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marechal</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Risoldi</surname>
          </string-name>
          .
          <article-title>AlPiNA: An Algebraic Petri Net Analyzer</article-title>
          . In TACAS'2010:
          <article-title>Tools and Algorithms for the Construction and Analysis of Systems</article-title>
          , Paphos, Cyprus, volume
          <volume>6015</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>349</fpage>
          -
          <lpage>352</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. L. McMillan</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          <string-name>
            <surname>Dill</surname>
            , and
            <given-names>L. J.</given-names>
          </string-name>
          <string-name>
            <surname>Hwang</surname>
          </string-name>
          .
          <source>Symbolic Model Checking: 1020 States and Beyond. Information and Computation</source>
          ,
          <volume>98</volume>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Chaouiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klaudel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pommereau</surname>
          </string-name>
          . A Modular,
          <article-title>Qualitative Modeling of Regulatory Networks Using Petri Nets</article-title>
          .
          <source>In Modeling in Systems Biology</source>
          , volume
          <volume>16</volume>
          of Computational Biology, pages
          <fpage>253</fpage>
          -
          <lpage>279</lpage>
          . Springer London,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>C.</given-names>
            <surname>Chaouiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Remy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mossé</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Thieffry</surname>
          </string-name>
          .
          <article-title>Qualitative Analysis of Regulatory Graphs: A Computational Tool Based on a Discrete Formal Framework</article-title>
          .
          <source>In Positive Systems, volume 294 of Lecture Notes in Control and Information Sciences</source>
          , pages
          <fpage>830</fpage>
          -
          <lpage>832</lpage>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Cytoscape</given-names>
            <surname>Consortium</surname>
          </string-name>
          and
          <string-name>
            <given-names>Funding</given-names>
            <surname>Agencies</surname>
          </string-name>
          . http://www.cytoscape.org/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hucka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hoops</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Keating</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schaff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Smith</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          .
          <source>The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core</source>
          ,
          <year>2010</year>
          . Available from Nature Precedings http://dx.doi.org/10.1038/npre.
          <year>2010</year>
          .
          <volume>4959</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>J-M. Couvreur</surname>
            and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Encrenaz</surname>
            and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Paviot-Adet</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Poitrenaud</surname>
          </string-name>
          and PA. Wacrenier.
          <article-title>Data Decision Diagrams for Petri Net Analysis</article-title>
          .
          <source>In ATPN'02: International Conference on Application and Theory of Petri Nets</source>
          , volume
          <volume>2360</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>101</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>F.</given-names>
            <surname>Jacob</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Monod</surname>
          </string-name>
          .
          <article-title>Genetic Regulatory Mechanisms in the Synthesis of Proteins</article-title>
          .
          <source>Journal of molecular biology</source>
          ,
          <volume>3</volume>
          :
          <fpage>318</fpage>
          -
          <lpage>356</lpage>
          ,
          <year>1961</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>S.</given-names>
            <surname>Moodie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Le</given-names>
            <surname>Novere</surname>
          </string-name>
          , E. Demir, H. Mi,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Villeger</surname>
          </string-name>
          .
          <source>Systems Biology Graphical Notation: Process Description language Level 1</source>
          ,
          <year>2011</year>
          . Available from Nature Precedings http://dx.doi.org/10.1038/npre.
          <year>2011</year>
          .
          <volume>3721</volume>
          .4.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>M. Nagasaki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Doi</surname>
            , H. Matsuno, and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Miyano</surname>
          </string-name>
          .
          <article-title>A Versatile Petri Net Based Architecture for Modeling and Simulation of Complex Biological Processes</article-title>
          .
          <source>Genome Informatics</source>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>180</fpage>
          -
          <lpage>197</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>A.</given-names>
            <surname>Naldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Berenguier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fauré</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thieffry</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Chaouiya</surname>
          </string-name>
          .
          <article-title>Logical Modelling of Regulatory Networks with GINsim 2.3</article-title>
          . Biosystems,
          <volume>97</volume>
          (
          <issue>2</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>A.</given-names>
            <surname>Naldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chaouiya</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Thieffry</surname>
          </string-name>
          . GINsim. http://gin.univ-mrs.fr/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Markiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ozier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.S.</given-names>
            <surname>Baliga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schwikowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Ideker</surname>
          </string-name>
          .
          <article-title>Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks</article-title>
          .
          <source>Genome Research</source>
          ,
          <volume>13</volume>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>16. The University of Tokyo. CellIllustrator. http://www.cellillustrator.com/.</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. R. Thomas.
          <article-title>Regulatory Networks seen as Asynchronous Automata: a Logical Description</article-title>
          .
          <source>Journal of Theoretical Biology</source>
          ,
          <volume>153</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>A.</given-names>
            <surname>Valmari</surname>
          </string-name>
          .
          <article-title>The State Explosion Problem</article-title>
          .
          <source>In Lectures on Petri Nets I: Basic Models, Advances in Petri Nets</source>
          , pages
          <fpage>429</fpage>
          -
          <lpage>528</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>