<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">GReg : a domain specific language for the modeling of genetic regulatory mechanisms</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicolas</forename><surname>Sedlmajer</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Didier</forename><surname>Buchs</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Steve</forename><surname>Hostettler</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alban</forename><surname>Linard</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Edmundo</forename><surname>Lopez</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexis</forename><surname>Marechal</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Université de Genève</orgName>
								<address>
									<addrLine>7 route de Drize</addrLine>
									<postCode>1227</postCode>
									<settlement>Carouge</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">GReg : a domain specific language for the modeling of genetic regulatory mechanisms</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">78085589EBAB3C93C41D9D524F03E9CE</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T16:45+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Chemical and biological systems have similarities with ITsystems as they can be observed as sequences of events. Most available tools propose simulation frameworks to explore biological pathways (i.e., sequences of events). Simulation only explores a few of the most probable pathways in the system. On the contrary, techniques such as model checking, coming from IT-systems analysis, explore all the possible behaviors of the modeled systems, thus helping to identify interesting pathways. A main drawback from most model checking tools in the life sciences domain is that they take as input a language designed for computer scientists, that is not easily understood by non-expert users. We propose in this article an approach based on Domain Specific Languages. It provides a comprehensible language to describe the system while allowing the use of complex and powerful underlying model checking techniques.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Because of their stochastic and combinatorial nature, many biological systems such as cellular and supra-cellular interactions are very hard to investigate. Current practice is mainly limited to the use of in vivo and in vitro experiments. Investigation through formal models of biological systems is currently a rather restricted research field, unlike what has been done in other natural sciences such as chemistry and physics. There is clearly an emerging field of research where future experiments can be partially performed in silico, i.e., by means of techniques from computer science. One of the main approaches of biological modeling is the so-called regulatory networks <ref type="bibr" target="#b15">[17,</ref><ref type="bibr" target="#b4">5]</ref>. The main idea of biological modeling according to the regulatory network approach is to model interbiological reactions through a set of interdependent biological rules. This can be seen as a set of discrete modules having strong interconnections. The occurrence of interesting events in the biological system can be represented as logical properties expressed on the state of these modules. This is very similar to the kind of properties computer scientists validate on hardware and software systems (deadlocks, error states,. . . ).</p><p>Among the tools available in this domain, the main analysis approach for regulatory networks is simulation. Simulation is generating and analyzing a limited sample of possible system behaviors. This technique is not convenient when the main purpose of the research is to look for rare or abnormal behaviors (e.g., cancer). The main approach in this case is to use model checking instead of simulation. Model checking consists in generating and analyzing the complete set of possible states of the system. Naturally, this technique suffers from the drawback of the enormous number of possible states of biological systems.</p><p>It is interesting to note that this problem is well-known to the model checking community in computer science, where it is called the state space explosion <ref type="bibr" target="#b16">[18]</ref>. There is a parallel between cellular interactions and software systems in that the state space explosion is mainly due to their concurrent nature. Therefore, we can apply techniques that have been developed for the model checking of hardware and software systems to biological interactions. Approaches based on a symbolic encoding of the state space are particularly well-suited for this <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b8">9]</ref>.</p><p>In this paper we show a work in progress in our group. We present Gene Regulation Language (GReg), our first attempt to build a framework for modeling and analyzing biological systems based on formal modeling and reasoning. Advanced techniques for defining Domain Specific Languages, giving their semantics and analyzing them using symbolic model checking are presented. First, Section 2 describes precisely the biological domain considered by GReg, then Section 3 outlines the state of the current research in the field and Section 4 describes in detail the creation and usage of Domain Specific Languages. The following two chapters describe GReg itself. Section 5 describes the language designed for expressing biological mechanisms and the corresponding queries, and Section 6 provides a simple example taken from the literature. Finally, Section 7 concludes the article and discusses the future research perspectives in this area.</p><p>2 Chemical and biological models covered by GReg t r a n s cr ip t i o n s p li cin g t r a n slat i o n DNA pre-mRNA mRNA protein</p><p>Fig. <ref type="figure">1</ref>: DNA to protein process</p><p>The purpose of GReg is to describe genetic regulatory mechanisms controlling the DNA to protein process (Figure <ref type="figure">1</ref>). This process comprises three steps: transcription, splicing and translation. The regulation of each step can be modeled using standard chemical reactions, presented in Section 2.1. For the transcription initiation we use the genetic regulatory mechanism model, presented in Section 2.2. Finally we define a cell network in Section 2.3. We separate these three domain models to clearly distinguish chemical from biological concepts.</p><p>We use the same definition of molecule level as presented in <ref type="bibr" target="#b15">[17]</ref>, if a molecule has n distinct actions, we define (n + 1) levels. This allows us to use a discrete formalism to efficiently model the gene expression <ref type="bibr" target="#b15">[17]</ref>, which is in fact the concentration of the gene products. The lowest level is the lowest transcription rate of a gene.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Chemical compartment model</head><formula xml:id="formula_0">A chemical compartment κ = M κ , R ∈ K ,</formula><p>is composed of a non-empty set of molecules (∅ = M κ ⊆ M olecules) and a set of reactions (R ⊆ Reactions). Two compartments may be separated by a membrane (i.e., selective barrier), thus allowing molecule transfer between them.</p><p>A chemical reaction ρ = Re, Pr , Ca, k, type ρ ∈ R, where Re, Pr , Ca ⊆ M(M κ )×K , the reactants (Re), products (Pr ) and catalysts (Ca) are defined as the association of a multiset of one molecule (M(M κ )) with a given compartment (κ ∈ K ).</p><p>Catalysts (Ca) have the particularity to appear as reactants and products in a chemical reaction. We have chosen to define the catalysts in a distinct set, therefore no catalyst can appear as reactant or product:</p><formula xml:id="formula_1">∀m ∈ M κ , m ∈ Ca =⇒ m /</formula><p>∈ Re ∧ m / ∈ Pr . A reaction may be either irreversible or reversible, type ρ ∈ {irr, rev}. In reversible reactions an equilibrium constant (k) is defined with the ratio of both direction rates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Genetic regulatory mechanism model</head><p>A genetic regulatory mechanism µ = Γ µ , C , is composed of a non-empty set of genes (Γ µ ). Genes are organized into one or more chromosomes (C ). A genetic regulatory mechanism is contained in a chemical compartment, this specification will be presented in Section 2.3.</p><p>A chromosome c = λ 1 , . . . λ n ∈ C is a sequence of loci. A gene may have different version (i.e., alleles) at a given chromosome location (i.e., locus). And a locus λ = Γ λ defines a non-empty set of genes (∅ = Γ λ ⊆ Γ µ ) that are located at a given locus. Then the set of all possible chromosomes in a mechanism is the Cartesian product of the set of genes at each locus:</p><formula xml:id="formula_2">Chromosomes = Γ λ1 × • • • × Γ λn .</formula><p>A gene γ = M γ , Σ ∈ Γ µ is a portion of DNA that codes for at least one molecule (∅ = M γ ⊆ M κ ), and may contain some regulation sites (Σ ⊆ Sites).</p><p>A regulation site σ = M σ , type σ ∈ Σ defines the non-empty set of regulatory molecules (∅ = M σ ⊆ M κ ) associated to the regulation site σ of a given gene. Note that when using anti-termination sites, genes order matters, therefore chromosomes must be defined.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I1 O1</head><p>In On start A1 T1 An Tn stop promoter region transcribed region Fig. <ref type="figure">2</ref>: Idealized gene structure Our idealized gene structure (Figure <ref type="figure">2</ref>) is composed of two regions: promoter and transcribed. Note that the exact position in the gene of each regulatory site is not specified, i.e., we are mainly interested in its regulatory role. The type of a regulation site is type σ ∈ {I,O,A,T}.</p><p>Initiation (I) the portion of DNA where bound activators increase the rate of the transcription process. It is located in the promoter region; Operator (O) the portion of DNA where bound repressors block the transcription process. It is located in the promoter region; Anti-termination (A) the portion of DNA where activators continue the transcription process. It is located in the transcribed region. These sites may also allow the transcription of the next gene; Termination (T) (also called attenuator) the portion of DNA where repressors stop the transcription process. It is located in the transcribed region; thus, it produces a reduced RNA.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Cell network</head><p>The cell network model is designed to model the interactions of different cells with their environment and also with their inner components (i.e., organelles). This model authorizes the construction of currently not observed cells, e.g., prokaryote with nucleus, eukaryote with multiple nuclei, etc. The validity of the specification is delegated to the user (i.e., domain expert). This model defines three chemical compartments, therefore they inherit both sets M κ and R.</p><p>Organelle, ω = M κ , R, µ is the lowest compartment in the compartment hierarchy. An organelle may contain a mechanism (µ), e.g., nucleus, mitochondrial DNA, etc. Cell, φ = M κ , R, µ, Ω contains a possibly empty set of organelles (Ω). The model of a prokaryote cell would define a mechanism (µ), by cons an eukaryote cell would define instead an organelle with a mechanism (i.e., nucleus). Network, ν = M κ , R, Φ represents the environment and contains one or more cells (Φ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Related work</head><p>In this section we compare three well-known tools with our approach. We first define a few criteria such as the kind of analysis, the supported formalism and the supported exchange format. Table <ref type="table" target="#tab_0">1</ref> presents a summary of the resulting comparison.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Domain language</head><p>To be productive, the syntax of the input language should be as close as possible to the actual domain of the user. This input language can be textual (like in tools that use Systems Biology Markup Language (SBML) <ref type="bibr" target="#b7">[8]</ref>) or graphical (like Systems Biology Graphical Notation (SBGN) <ref type="bibr" target="#b10">[11]</ref>).</p><p>Simulation &amp; Model Checking Although there are many tools adapted to biological process design and simulation, only a few of them allow exhaustive exploration of the state space. While simulation is very useful during model elaboration, an exhaustive search may help to discover pathological cases that would have never been explored by simulation. Discrete &amp; continuous Continuous models are closer to the real biological systems than discrete models, but unlike the latter they are not adapted for model checking techniques. Discrete formalisms allow a complete exploration of the state space while preserving the qualitative properties of the system, as mentioned in <ref type="bibr" target="#b15">[17]</ref>. Exchange format The supported interchange format is an important feature as it allows us to bridge the gap between different tools and therefore enables the user to use the most adapted tool to hand. SBML is a common interchange format based on XML. It is used to describe biochemical reactions, gene regulation and many other topics.</p><p>Cell Illustrator [16] is an example of a commercial simulation tool for continuous and discrete domains. The graphical formalism is based on PN, called Hybrid Functional Petri Nets with extensions (HFPNe), which adds the notions of continuous and generic processes and quantities <ref type="bibr" target="#b11">[12]</ref>. The XML-based exchange file format used in Cell Illustrator is called CSML.</p><p>Gene Interaction Network simulation a.k.a. GinSim <ref type="bibr" target="#b13">[14]</ref> is a tool for the modeling and simulation of genetic regulatory networks. It models genetic regulatory networks based on a discrete formalism <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b12">13]</ref>. These models are stored using the XML-based format ginml. The simulation computes a state transition graph representing the dynamical behavior network. GINsim uses a graphical Domain Specific Language (DSL) called Logical Regulatory Graph (LRG) <ref type="bibr" target="#b4">[5]</ref>. Models in LRG are graphs, where nodes are regulatory components (i.e., molecules and genes) and arcs are regulatory interactions (i.e., activation and repression) between the nodes.</p><p>Cytoscape <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b14">15]</ref> is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. Cytoscape supports many file formats including PSI-MI and SBML. It has the advantage of adding features through a plug-in system. Many plug-ins are available for various domains such as biology, bioinformatics, social network analysis and the semantic web. Over 100 plug-ins are listed on the official website.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">DSL approach</head><p>Model checking involves verifying whether a property holds on the whole set of possible states of a given model. To generate this complete state space, the model must be expressed in a formal language intended for this operation, like Petri Nets (PNs). This makes the model checking approach impractical for people who do not master these formal languages. We propose using DSLs for this purpose. A DSL is a programming or specification language tailored for a given domain; it presents a reduced set of instructions closely related to this domain. Using a DSL has two main objectives. First, learning the language should be easy for someone with enough knowledge about the domain, even if this person does not have previous knowledge of other languages. Second, the number of errors made by a novice user should be drastically reduced as the expressivity of the language is reduced to the minimum.</p><p>The DSL semantics are defined by transformation into a target language, which is a formal language where complex operations (like model checking) can be performed. Usually, the scope of the target language is broader than the scope of the DSL. This allows using the same target language and its associated tools for different DSLs. Moreover, while creating a new DSL, it is often possible to use an already-existing language as a target, thus facilitating the language creation process. The results obtained in the target language are translated again into the DSL and returned to the user. This process is described in Figure <ref type="figure">3</ref>. After the creation of the initial model, all the following steps must be fully automatic, to hide the underlying complexity from the end user. The person in charge of the DSL creation is a language engineer. This person should obviously have a certain knowledge about the language creation process, but he should also master the target platform language, in order to define efficient and correct transformations. Furthermore, he should be in contact with at least one domain expert, in order to settle the requirements and verify the correctness and completeness of the language created. The creation of a DSL follows a set of specific steps. First, the language engineer must identify the abstract concepts of the domain. These concepts include the basic elements of the domain, the interactions between these elements and the precise boundaries of the domain considered. Based on these concepts, the language engineer must define a set of expressions used to create a specific model in the domain, i.e., a concrete syntax. Finally, the language engineer must define the semantics of the language created, usually by transformation to an existing platform. The whole process must be validated by one or more domain experts. Domain experts must validate the three steps of the DSL creation process: the domain must have been correctly defined, the expressions of the concrete syntax must be close to the already existing languages in the domain, and the execution must return the expected results. This creation/validation process often leads to an iterative development of the language. We show this entire process in Figure <ref type="figure" target="#fig_0">4</ref>.</p><p>There exist various DSLs tailored for the biological processes, e.g., SBML [8] and SBGN <ref type="bibr" target="#b10">[11]</ref>. These two well-known languages cover a wide range of systems, mainly in the bio-chemical domain. GReg, instead, focuses on a more specific domain, which is genetic regulatory mechanisms. This domain has been described in Section 2.</p><p>While creating GReg, we used the Eclipse Modeling Project (EMP) <ref type="bibr" target="#b0">[1]</ref> approach. We first created a metamodel of the domain using the Eclipse Modeling Framework (EMF), and we defined a concrete syntax with XText. XText provides a set of tools to create an editor for a given language, with some userfriendly features such as syntax highlighting, on the fly syntax checking (see Figure <ref type="figure" target="#fig_2">5</ref>) and auto-completion. As a target platform we chose AlPiNA. AlPiNA <ref type="bibr" target="#b2">[3]</ref> is a model checking tool for Algebraic Petri Nets (APNs). It aims to perform efficient model checking on models with extremely large state spaces, using Decision Diagrams (DDs) to tackle the state explosion problem. AlPiNA's input languages were also defined using the EMP approach. This allowed us to use Atlas Transformation Language (ATL) transformations, which is a tool dedicated to define model to model transformations. GReg is thus fully integrated in the Eclipse/EMP framework.</p><p>The modular structure of GReg's definition would allow us to replace the target domain while keeping exactly the same language. If needed, we could, for example, define a transformation to SBML using a model to text transformation tool like XPand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">GReg : Gene Regulation Language</head><p>GReg is a Domain Specific Language designed to describe genetic regulatory mechanisms. We built it in order to illustrate the DSL approach, and the benefits it provides to research in the life sciences domain. Throughout this section, we introduce the GReg language using an excerpt of the lac operon model <ref type="bibr" target="#b9">[10]</ref>. We also introduce the GReg Query Language (GQL) language, used to specify the queries to be executed in the model specified in GReg.</p><p>We first show how to describe a regulation mechanism. Listing 1 shows the overall structure of a GReg mechanism specification. The mechanism is named (lac_operon). It specifies the molecules occurring in the mechanism, and the chemical reactions between these molecules. The GReg description also specifies the genes with their properties and organization into chromosomes. The molecules section of a GReg description specifies the molecules occurring in the mechanism. For instance, in Listing 2, the lac_operon mechanism uses molecules lactose, allolactose, lacI, lacZ, etc.</p><p>Molecules are only described by their names, as it is the only information relevant in our language. The DSL approach emphasizes specification of only the required information for the particular domain. No molecules other than the ones here can be used in the mechanism. This constraint is useful for the user creating a GReg specification: spelling errors in molecule names are detected, see Figure <ref type="figure" target="#fig_2">5</ref>. A reaction can be either irreversible ( → ) or reversible ( ↔ ). In our example, the reaction induction is a degradation of lacI and allolactose. A degradation is an irreversible reaction where products are not interesting, thus the reactants are simply removed from the system.</p><p>If needed, each direction of the reaction can be given a reaction rate (i.e., probability). Each reaction can also have catalysts specified using the keyword cat. For instance allo is a reaction catalyzed by lacZ. Listing 4: Genes declaration Listing 4 presents the genes specification. For instance, rep is a minimal gene (i.e., not regulated). A minimal gene defines at least the molecules it codes. If they are relevant, regulation sites are also specified in a section with the sites keyword. The lac gene defines a regulated gene with two regulation sites I and O , together with the molecule acting on them. Note that GReg also allows us to define several regulation sites for I, O, A, T.</p><p>As molecules may be present at different levels, the @ keyword allows the specification of required molecules levels acting at a regulation site. By default, molecule levels are valued 1. As several molecules can act on one regulation site, GReg allows to combine molecules with Boolean operators for a regulation site. There are two operators defined : and and or. For instance the I site of lac gene specifies that cAMP and CAP are required to activate this site. The = keyword allows to specify the target level attributed to the gene once this site is active.</p><p>Note that for A sites, it is also possible to specify the next gene target level. As the role of a T site is to interrupt the transcription process, we allow the specification of the reduced set of produced molecules when these sites are active.</p><formula xml:id="formula_3">mechanism lac_operon i s ... chromosomes c : { rep } , { lac , lac '} ... end lac_operon</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Listing 5: Chromosomes declaration</head><p>The chromosomes section is used to specify one or more chromosomes. A chromosome defines the sequence of loci. A locus is defined between two braces. Note that genes'order in each locus does not matter. This section is mandatory when taking into account A sites. Listing 6 shows an example of GQL specification. The use keyword imports the lac_operon mechanism from another file, allowing us to reference the molecules declared in lac_operon mechanism from a query specification. A GQL file specifies the levels and the queries. use " lac_operon . greg " l e v e l s l1 : lacZ = 1 l2 : lacZ = 0 , lacI = 2 queries bool a : e x i s t s l1 paths b : paths l1 , l2 paths c : paths l1 .. l2</p><p>Listing 6: GQL queries specification</p><p>The levels section is used to define the combination of levels. A combination of levels is a partial or total definition of molecule levels, while unspecified molecules may match any possible value. For instance l1 specifies only the level of lacZ among all molecules defined in lac_operon. The exists query returns true if predefined level exists. The paths query is used to retrieve all paths from the state space matching the sequence of predefined levels. The query b returns the path where l2 is a direct successor of l1. But query c does not require that l2 is a direct successor of l1. use " lac_operon . greg " use " lac_operon . gql " i n i t i a l l y lac_operon has lactose = 1 lacI = 1 execute i f a then ( b and c )</p><p>Listing 7: GReg specification Listing 7 shows an example of a GReg configuration specification given outside the mechanism, usually in a separate file. This allows to easily repeat experiments for the same mechanism with several initial quantities. The first section starts with the initially keyword and defines the genes or molecules initial levels. The second section starts with the execute keyword and is used to specify which queries will be executed by the model checker.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Example</head><p>We present a simple example a genetic regulatory mechanism taken from <ref type="bibr" target="#b15">[17]</ref> with three genes. Gene Y is activated by the product of gene X. Genes X and Z are repressed by the products of genes Z and Y respectively. The products of genes X, Y and Z are molecules x, y and z respectively. Graphical (LRG) and textual (GReg) models derived from this example are given at Figure <ref type="figure">6</ref>  Listing 8: GReg model of Figure <ref type="figure">6</ref> GReg models are transformed into APN models usable by AlPiNA. Note that the obtained APNs can always be unfolded into Place/Transition Petri Nets (P/Ts). We propose two different transformations:</p><p>the first transformation produces a P/T shown in Figure <ref type="figure">7</ref>, called Multi-level Regulatory Petri net (MRPN) in <ref type="bibr" target="#b4">[5]</ref> ; the second transformation produces an APN shown in Figure <ref type="figure">8</ref>, which is the folding of the corresponding MRPN and thus more compact.</p><p>From theses models AlPiNA is able to compute the state space and to identify all deadlock without requiring any additional input from the user.</p><p>A deadlock is a situation where no more events can occur in the system. Strictly speaking, in real biological systems there are no deadlocks but livelocks, a situation where events still occur but without changing the state of the system. The states where such situations occur are usually called stable states or attractors.</p><p>But we can also search for specific properties of the biological system. As previously mentioned, the state space might contain too many states to be used as it is. Therefore we propose a way to extract portions of state space (e.g., subsets  of states, paths, cycles, ...) through GQL queries. Listing 9, we have defined one level (l1) and one query (at).</p><p>use " example . greg " l e v e l s l1 : x = 1 , y = 1 queries s t a t e s s1 : at l1</p><p>Listing 9: GReg query example This query returns the subset (s1) of states from the state space where the level of x and y is equal to one. To compute the set of states s1, the query is transformed into an AlPiNA property, shown in Listing 10.</p><p>s1 : e x i s t s ( $x in x , $y in y , $z in z :</p><p>(( $x equals suc ( zero )) and ( $y equals suc ( zero ))) = false</p><p>Listing 10: AlPiNA property expression of query in Listing 9</p><p>The example in Figure <ref type="figure">6</ref> and Listing 8 has eight states reachable from the initial marking, where all molecules are initially at level zero (x,y,z) = (0,0,0), see Figure <ref type="figure">9</ref>. Model checking of the example PN finds two stable states: (1,1,0) and (0,0,1) and returns for s1 the two states: (1,1,0) and (1,1,1).  This paper introduces Gene Regulation Language (GReg), a language dedicated to the modeling of regulatory mechanisms. We explain the need to explore completely the sets of possible behaviors of a given model in order to detect rare events. GReg includes a query language used to express the properties of such events.</p><formula xml:id="formula_4">x = 0 y = 0 z = 0 x = 0 y = 0 z = 1 x = 0 y = 1 z = 0 x = 0 y = 1 z = 1 x = 1 y = 0 z = 0 x = 1 y = 0 z = 1 x = 1 y = 1 z = 0 x = 1 y = 1 z = 1</formula><p>From a technical point of view, we explain that the techniques are on general principles borrowed from software modeling and verification. These techniques include the use of a dedicated DSL defined with a metamodeling approach and the translation of this language into a formal verification platform called AlPiNA.</p><p>The languages and transformations shown in this article have been implemented and tested on several toy examples. We also asked biologists to assess the expressivity and usability of the language. Although the first feedback seems promising, there is much room for improvement. We foresee three main axes of future development: improving the expressivity of the modeling and query languages, assessing the usability of the approach and exploring the mitigation of the state space explosion.</p><p>Extending the expressivity of GReg Concepts such as time and probabilities play an important role in biology and are therefore good candidates for a language extension. As the current underlying formalism, APNs, does not support these notions, such extension would require changing the target platform. Good examples of target formalisms are timed Petri nets and stochastic Petri nets. As mentioned in 4, the techniques used to create GReg allow changing the target language without changing the language itself. Note that we do not plan to add continuous concepts, used in languages such as HFPNe. Improving the usability of our tool Textual domain specific languages constitute a first step towards democratization of formal methods. Although highly efficient, textual languages are usually not as intuitive as graphical languages. On the other hand, graphical domain specific languages are especially good in the early phase of the modeling as well as for documentation, but they are often less practical when the model grows. The tools in EMP that were used to create GReg allow us to define a graphical version of the same language, thus keeping the best of both worlds. Another way to ease the modeling phase is to allow import/export of models from/to other formalisms and standards such as SBML and to integrate it with Cytoscape through its plug-in mechanism. Mitigating the state space explosion So far, we have done little experimentation in this area for biological processes. Nevertheless, we conducted several studies on usual IT protocols and software models that show that AlPiNA can handle huge state spaces <ref type="bibr" target="#b1">[2]</ref>. This suggests promising results in the regulatory mechanisms domain.</p><p>The development of GReg is a work in progress, we would like to set up more collaborations with biologists interested in exploiting formal techniques from computer science to discover rare events. We think that we can make, in the near future, a useful contribution to life sciences based on advanced techniques borrowed from computer science.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 4 :</head><label>4</label><figDesc>Fig. 3: DSL computational process</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>mechanism lac_operon i s molecules −− d e c l a r a t i o n o f m o l e c u l e s r e a c t i o n s −− d e c l a r a t i o n o f c h e m i c a l r e a c t i o n s chromosomes −− d e c l a r a t i o n o f chromosomes gene −− d e c l a r a t i o n o f a g e n e −− d e c l a r a t i o n o f o t h e r g e n e s end lac_operon Listing 1: GReg mechanism specification mechanism lac_operon i s molecules lactose , allolactose , lacI , lacZ , lacY , lacA , cAMP , CAP ... end lac_operon Listing 2: Molecules declaration</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 :</head><label>5</label><figDesc>Fig. 5: Example of a spelling error in a molecule name (allolactose).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 7 : 1 Fig. 8 :</head><label>718</label><figDesc>Fig. 7: PN model of example in Figure 6 and Listing 8</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 9 : 8 7</head><label>98</label><figDesc>Fig. 9: State space of example in Figure 6 and Listing 8</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Tool comparison table.</figDesc><table><row><cell>Tool</cell><cell>Cell Illustrator</cell><cell>GINsim</cell><cell>Cytoscape</cell><cell>GReg</cell></row><row><cell>Domain language</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Simulation</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>State space</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Model checking</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Discrete</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Continuous</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Exchange format</cell><cell>CSML</cell><cell>GINML</cell><cell>SBML,. . .</cell><cell>GReg</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>and Listing 8 respectively.</figDesc><table><row><cell></cell><cell></cell><cell>mechanism example i s</cell></row><row><cell></cell><cell></cell><cell>molecules x , y , z</cell></row><row><cell></cell><cell></cell><cell>gene X codes x</cell></row><row><cell>X</cell><cell>Y</cell><cell>s i t e s O : z</cell></row><row><cell></cell><cell></cell><cell>end X</cell></row><row><cell></cell><cell></cell><cell>gene Y codes y</cell></row><row><cell></cell><cell></cell><cell>s i t e s I : x</cell></row><row><cell></cell><cell></cell><cell>end Y</cell></row><row><cell>Z</cell><cell></cell><cell>gene Z codes z</cell></row><row><cell></cell><cell></cell><cell>s i t e s O : y</cell></row><row><cell></cell><cell></cell><cell>end Z</cell></row><row><cell></cell><cell></cell><cell>end example</cell></row><row><cell cols="2">Fig. 6: LRG model of example</cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Proceedings of the 2nd International Workshop on Biological Processes &amp; Petri Nets (BioPPN2011) online: http://ceur-ws.org/Vol-724 pp.21-35</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://www.eclipse.org/modeling/" />
		<title level="m">Eclipse Modeling Project</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">AlPiNA: A Symbolic Model Checker</title>
		<author>
			<persName><forename type="first">D</forename><surname>Buchs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hostettler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marechal</surname></persName>
		</author>
		<author>
			<persName><surname>Risoldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Petri Nets&apos;2010: Applications and Theory of Petri Nets</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Braga, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">6128</biblScope>
			<biblScope unit="page" from="287" to="296" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">AlPiNA: An Algebraic Petri Net Analyzer</title>
		<author>
			<persName><forename type="first">D</forename><surname>Buchs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hostettler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Marechal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Risoldi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">TACAS&apos;2010: Tools and Algorithms for the Construction and Analysis of Systems</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Paphos, Cyprus</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="volume">6015</biblScope>
			<biblScope unit="page" from="349" to="352" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Symbolic Model Checking: 10 20 States and Beyond</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Burch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Clarke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">L</forename><surname>Mcmillan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Dill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Hwang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information and Computation</title>
		<imprint>
			<biblScope unit="volume">98</biblScope>
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A Modular, Qualitative Modeling of Regulatory Networks Using Petri Nets</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chaouiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klaudel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pommereau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Modeling in Systems Biology</title>
		<title level="s">Computational Biology</title>
		<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="253" to="279" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Qualitative Analysis of Regulatory Graphs: A Computational Tool Based on a Discrete Formal Framework</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chaouiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Remy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mossé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thieffry</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Positive Systems</title>
		<title level="s">Lecture Notes in Control and Information Sciences</title>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">294</biblScope>
			<biblScope unit="page" from="830" to="832" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="http://www.cytoscape.org/" />
		<title level="m">Cytoscape Consortium and Funding Agencies</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">The Systems Biology Markup Language (SBML): Language Specification for Level 3 Version 1 Core</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hucka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bergmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hoops</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Keating</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sahle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schaff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wilkinson</surname></persName>
		</author>
		<idno type="DOI">10.1038/npre.2010.4959.1</idno>
		<ptr target="http://dx.doi.org/10.1038/npre.2010.4959.1" />
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Data Decision Diagrams for Petri Net Analysis</title>
		<author>
			<persName><forename type="first">J-M</forename><surname>Couvreur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Encrenaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Paviot-Adet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Poitrenaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P-A</forename><surname>Wacrenier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ATPN&apos;02: International Conference on Application and Theory of Petri Nets</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">2360</biblScope>
			<biblScope unit="page" from="101" to="120" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Genetic Regulatory Mechanisms in the Synthesis of Proteins</title>
		<author>
			<persName><forename type="first">F</forename><surname>Jacob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Monod</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of molecular biology</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="318" to="356" />
			<date type="published" when="1961">1961</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Systems Biology Graphical Notation: Process Description language Level 1</title>
		<author>
			<persName><forename type="first">S</forename><surname>Moodie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Le Novere</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Demir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Villeger</surname></persName>
		</author>
		<idno type="DOI">10.1038/npre.2011.3721.4</idno>
		<ptr target="http://dx.doi.org/10.1038/npre.2011.3721.4" />
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A Versatile Petri Net Based Architecture for Modeling and Simulation of Complex Biological Processes</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nagasaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Doi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Matsuno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Miyano</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Informatics</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="180" to="197" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Logical Modelling of Regulatory Networks with GINsim 2.3</title>
		<author>
			<persName><forename type="first">A</forename><surname>Naldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Berenguier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fauré</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thieffry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chaouiya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biosystems</title>
		<imprint>
			<biblScope unit="volume">97</biblScope>
			<biblScope unit="issue">2</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Naldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chaouiya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Thieffry</surname></persName>
		</author>
		<ptr target="http://gin.univ-mrs.fr/" />
		<title level="m">GINsim</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Cytoscape: a Software Environment for Integrated Models of Biomolecular Interaction Networks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Shannon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Markiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Ozier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">S</forename><surname>Baliga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ramage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Amin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Schwikowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ideker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Genome Research</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Regulatory Networks seen as Asynchronous Automata: a Logical Description</title>
		<author>
			<persName><forename type="first">R</forename><surname>Thomas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Theoretical Biology</title>
		<imprint>
			<biblScope unit="volume">153</biblScope>
			<biblScope unit="page" from="1" to="23" />
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">The State Explosion Problem</title>
		<author>
			<persName><forename type="first">A</forename><surname>Valmari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Lectures on Petri Nets I: Basic Models, Advances in Petri Nets</title>
				<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="429" to="528" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
