On the Ontological Foundations of Cellular Development Patryk BUREKa and Nico SCHERF b, c, 1 and Heinrich HERRE d, 1 a Institute of Computer Science, Faculty of Mathematics, Physics and Computer Science, Marii Curie-Sklodowskiej University, Lublin, Poland, b Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, School of Medicine, TU Dresden, Dresden, Germany, c Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany d Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany Abstract. Time-lapse microscopy is a principal tool to unravel the mystery of how cells form and maintain organisms. The complexity of the domain of cellular dynamics demands a conceptual architecture as a solid theoretical foundation that supports the integration of knowledge obtained across experiments and theories. In this work, we outline the ontological foundation of cellular genealogies, a key concept for describing and representing of cellular development. We build the conceptual framework following the onto-axiomatic method: We first analyse the domain within the context of a top-level ontology (GFO). The resulting domain- specification provides the basis for a conceptualisation where we introduce concepts and relations. From these conceptualisations, we then construct model-structures adhering to the principles of model-theory. We finally elaborate axioms based on these model-structures. The developed framework provides the fundamental concepts underlying a Cell Tracking Ontology (CTO) that supports extraction and integration of biological knowledge from systems-level experiments across different types of observations at the single-cell level. Keywords. Knowledge management, Ontology of biological reality, Theories of Developmental Biology, Microscopy, Time-lapse imaging, Cell tracking 1. Introduction Cellular dynamics unfolding in space and time organise and shape multicellular life as it develops from a single fertilised egg into a complex organism. After development, cellular processes maintain the organism during its lifetime (tissue homeostasis and regeneration). To fully understand how cells build and maintain structures, we have to be able to observe cellular dynamics and cellular states from experiments [1]. One milestone was the reconstruction of the embryonic lineage tree of the nematode Caenorhabditis elegans using microscopy [2]. From these roots, modern fluorescence microscopy has turned into a powerful tool to resolve the dynamics of thousands of cells 1 Corresponding Authors: nico.scherf@tu-dresden.de, heinrich.herre@imise.uni-leipzig.de Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). together with readouts of cellular states by fluorescent labels [3]. Light-sheet microscopy in particular [4] has enabled us to collect four-dimensional (4D) movies of a range of developing embryos from the fruit fly Drosophila melanogaster [5] to mammalian model organisms such as the mouse [6]. Complementary to recording cellular dynamics by microscopy, new genetic methods deliver single-cell atlases of gene expression in developing embryos. Those measurements yield detailed information on the genetic state of single cells across development [7] although the resolution in space and time is very coarse. Thus, a critical question in computational biology is how to integrate data from these different experimental modalities (e.g. connect time-lapse imaging with single-cell sequencing) and across experiments (e.g. imaging of the same specimen in several labs) [1]. How can we extract knowledge from such data collections? To this end, we also need to develop and refine our concepts and theories to make sense of the intricate patterns we can observe [8,9]. As state-of-the-art microscopy becomes widely available to the biology community [10], we need to establish structured and general schemes [11] concurrently to annotate and share the tracking results. We should base these annotations on a solid theoretical foundation: As pointed out in [12] we should regard the underlying terminology and formal concepts themselves as theories about the biological world. Here, we develop the conceptual architecture that supports integration and interoperability in the field of cell tracking experiments. We discuss the concept of Cellular Genealogy [13] (or Cell Lineage) as a fundamental notion for the development of the Cell Tracking Ontology [14] - an ontology designed for the integration of data obtained from cell tracking experiments. 2. The Ontology of Cellular Genealogies Firstly, we define the notion of cellular genealogy and introduce essential subtypes. 2.1. Cell-Collective Genealogies We consider an individual cell as a material object; hence it has a lifetime, and since cells may divide and eventually die, the number of cells within a region under consideration (e.g. a developing organism) changes through time. Let us consider a time-segment (time-interval) I, such that during I no cell-division and no cell death occurs. Then, the cells existing during I form a collective Coll(I) that can be considered as a continuant through I [15]2. During times when the number of cells changes, new cells may occur, and cells may disappear (i.e. die). Let us consider the life of an organism Org from fertilisation to death. Org starts as a single cell, the zygote, develops into a multicellular structure which exists for some time in a dynamic equilibrium (e.g. cells get replenished). After that time, Org dies, i.e. the structures dissolve. We divide the lifetime of Org, LifeT(Org), into a sequence of non-overlapping time-intervals I(1), ... I(n) such that the following conditions are satisfied: (1) The intervals I(m) have a first point (they are left-closed), but no last point (right open). More precisely, they have the form [a(m), a(m + 1)) specifying the set {c : a(m) ≤ c < a(m + 1)}, where 0 ≤ m ≤ n. Further, LifeT(Org) = ⋃{ I(m) : 0 ≤ m ≤ n}. 2 We stipulate that a cell collective Coll preserves the number and the identity of the cells contained in Coll. (2) Let Coll(I(k)) be the set of cells existing during I(k), then no cell death or division occurs during the interval I(k), k ≺ n. Further, we assume that Coll(I(k)) ≠ Coll(I(k + 1)). These conditions imply further properties: From Coll(I(k)) to Coll(I(k + 1)) the number of existing cells changes. We consider two types: cell division and cell death. If a division of a cell c ∈ I(k) occurs, then this process ends up with two daughter cells starting their existence at the left boundary of the interval I(k + 1). Analogously, if a cell undergoes cell death during I(k), then this ends at the left-boundary of I(k + 1). The final definition of CollGen(c(0)) then must specify which cells from Coll(k) are related to which cells in Coll(k + 1). To this end, we introduce the relation div(x, y, z): a cell x of Coll(k) undergoes a cell division during I(k) resulting in two daughter cells y and z starting their existence at the left-boundary of I(k+1). We also introduce the relation id(x, y) stating that x belongs to Coll(k) and y belongs to Coll(k + 1) and both cells are identical. We further say that a cell x in Coll(k) has a successor cell y in Coll(k + 1), if y is either a daughter cell of x or if y is identical with x, denoted by cell_succ(x, y). The cell collective genealogy CollGen(c(0)), is then specified by the following system CollGen(c(0)) = ({Coll(k) | 0 ≤ k ≤ n}, div(x,y,z), id(x,y)), see Fig. 1a for an example. Figure 1. Different conceptual granularities of cellular genealogies: (a) Cell-Collective-Genealogies indicating cell division and cell death as well as the invariance intervals in time. (b) Cell-Situations- Genealogies capture spatial relations between cells in a given context (dashed outline shows convex hull). (c) Cell-Process-Genealogies describe cells as full spatio-temporal processes and their interactions. We assume that for any organism there exists a uniquely determined cell-collective genealogy. Subsequently, we sketch a proof of this claim. Let t(0) be the starting time- point of the organism (say the zygote). A set of cells, Cells(t), is associated with any time point of the organism’s life, i.e. the set of cells existing at that time-point. Now let t(1) be the latest time-point (> t(0)) such that during the time-interval [t(0), t(1)) no cells are born or die, and at time point t(1) either a cell division or a cell death occurs. t(1) is then the starting point of the next time-interval. The birth or death of a cell marks the end- point of a process, and these processes happen during [t(0), t(1)), and ending in time- point t(1) (death or birth). Assume we have carried out this construction up to time-point t(k). Then we repeat this procedure from t(k) onward, and there is a greatest time-point t(k + 1), greater than t(k), such that during [t(k), t(k + 1)) no cells are born or die, and at t(k + 1) cell birth or death is happening. A finite number of such steps t(0), ..., t(n) yields a sequence of time intervals, [t(0), t(1)), [t(1), t(2)), ... , [t(k), t(k + 1)), ..., [t(n - 1), t(n)]3 which satisfies the desired conditions4. We conceptually divide the whole life-cycle of an organism (the ontogeny) in various phases (e.g. embryogenesis, growth, ageing) as it develops from a single zygote into its adult form, ages and finally dies [16]. As outlined, for every cell collective x there is a uniquely determined time-interval I such that no changes occur during I. We call this time-interval the invariance interval of the cell-collective; it has a left-boundary and no right-boundary. The lifetime of any cell in the collective includes the invariance interval as a temporal part. There is a successor relation between the cell-collectives. The cell- collective y is a successor of the collective x if the right boundary of the invariance interval of x coincides with the left-boundary of the invariance interval of y. Further, we group every cell collective with invariance interval I into subsets of Coll(I). During development, certain groups of cells may correspond to the development of specific structures, e.g. organs such as the heart, brain, digestive tract etc. Understanding the structure of those sub-genealogies is an important research topic in developmental biology. The full sub-genealogy of a particular cell c, being a member of a cell-collective, contains all cells that are in the transitive closure of the successors of this cell. Any cell c generates its cell collective genealogy, denoted by CollGen(c) which is uniquely determined. Cell lineage indicates the development history of a tissue, organ or organism from earlier stages such as the fertilised egg. 2.2. Cell-Situation-Genealogies (SitGen) A cell situation genealogy is an extension of a cell collection genealogy: We start with the system CollGen(c(0)) and extend any collective Coll(k) of cells into an object- situation Sit(k). Sit(k) contains precisely the cells of Coll(k) as objects and is embedded into an object-situation with the time-frame I(k) and a specified space-frame which contains at least the spatial convex closure of the objects in Coll(k), see example in Fig. 1b). We are free to add relations between the cells or predicates of the cells in Coll(k). The set of these relations and predicates is inherently open-ended and defines the type of the corresponding situation. A signature Σ determines the situation type. Σ contains the admitted relational and predicate symbols. In the simplest case, this signature Σ is fixed for all situations of the genealogy. It may be necessary to introduce new relations and predicates during development. We consider a cell collective the simplest cell situation as its signature contains only the equality symbol =. 2.3. Cell-Process-Genealogies There are properties of cells, such as velocity or morphodynamics (changes in cell shape) along a trajectory that cannot be attributed to cells as objects and can only be captured by introducing cell processes (cf. [17]). The simplest process genealogy is defined by the transformation of the CollGen into a branching process, denoted ProcCollGen(c(0)). This process is determined by using the integration axiom of GFO and by transforming any cell of Coll(k) into a corresponding process. We can then define Proc(k) as the integration of all processes Proc(c) for any c in Coll(k). The processes in Proc(k) are usually not 3 With the exception of the last interval all other intervals are left-closed and right-open. 4 We assume that the time-points are represented by real numbers. A complete proof of the conditions uses the continuity of real numbers, in particular the fact that a bounded set of real numbers has a least upper bound. isolated threads because there can be meaningful interactions between them (e.g. cells exchanging signals by direct contact or diffusible signalling molecules). Hence, there are various versions of potential process genealogies. Analogously, we may transform any cell-situation Sit(k) into a process situation, called situoid. The investigation and classification of possible process genealogies is a research field of its own. We give a simple example to illustrate these ideas in Fig. 1 visualising a part of a cell-collective genealogy. The invariance intervals are: [t0, t1), [t1, t2), [t2, t3), [t3, t4), [t4, t5), [t5, t6), [t6, t7) The cell collectives, associated with the invariance intervals, are then: Coll(0) = ({c0}, [t0, t1)), Coll(1) = ({c1, c2}, [t1, t2)), Coll(2) = ({c1, c3, c4}, [t2, t3)), Coll(3) = ({c3, c4, c6, c5}, [t3, t4)), Coll(4) = ({c3, c6, c5}, [t4, t5)), Coll(5) = ({c3, c5}, [t5, t6)), Coll(6) = ({c3}, [t6, t7)). We can extend any of these cell collectives with relations. Here we consider an extension of the cell collective Coll(3). The collective identifies only the different cells contained in it, though it does not specify anything about the relations between those cells. As an example, we introduce the following relations: contact(x,y) := the cells x and y are in contact and between(x, y, z) := z is between x and y. The time-frame of Sit(3) coincides with the invariance interval of Coll(3), whereas the space-frame of the situation must be specified additionally (e.g. indicated by dashed region in Fig.1b). Since the cells typically move during the invariance interval, they may change their positions relative to each other (see Fig.1c). The snapshots of such situations are presentic entities, and a situation can be accessed only through its snapshots, which are called presentic situations. A presentic situation cannot adequately describe processes. Let us consider a cell c, as an object having a lifetime and thus, persisting through time: It is the same cell at every time-point of its lifetime. A cell c may move through space, and this movement (trajectory) is a process (Fig. 1c). The presentic locations of c in time is an external attributive of c. The trajectory of a cell cannot be an attributive of c, because the path itself has no presentic nature. If we consider a snapshot of the trace, its form disappears. To model these aspects, we replace any cell in the situation by its corresponding process to get a processual situation. 3. Formal Axiomatisation of Cellular Genealogies In this section, we present a selection of fundamental axioms about cell-collective genealogies. We develop these axioms using the onto-axiomatic method [18] integrating Hilbert’s approach in [19] with a top-level ontology as a general analytical framework. Here we use some axioms from GFO [20] and adapt them to the domain of cell development, and more generally, to the field of developmental biology. We introduce axioms at two levels, the level of GFO-axioms, which are essential to understand the domain-specific notions, and the level of cell biology. Furthermore, we introduce the idea of trans-level axioms connecting concepts across different levels of abstraction. 3.1. GFO-Level. 3.1.1. Objects and Processes A material object is a spatio-temporal individual occupying space, persisting through time that is wholly present at every time-point of its lifetime. For every material object Obj and a time point t of the object’s lifetime, there is a snapshot P of Obj at that time- point t, which we express by the expression snap (Obj, P, t). These snapshots differ from time-point to time-point, though there is something in common, a similarity, between them, which is captured by a universal Univ(Obj). Furthermore, any instance of Univ(Obj) stands for Univ(Obj). If we consider, for example, an individual cat C, and if we take into account Univ(C), then any instance of Univ(C) stands for the whole universal Univ(C). Also, we may imagine a prototypical cat representing this universal. The universal and a corresponding prototype express the phenomenon of persistence, whereas the instances themselves may change their properties through time. In contrast to an object, an individual process P evolves through time and can never be wholly present at a time-point. The restrictions of P to time-points of P’s temporal extension are called process-boundaries. A process boundary of a process P is an entity of presentic nature. However, it can never represent the entire process P. Processes and objects thus exhibit a fundamental duality in spatio-temporal reality. Material objects and processes are connected in a particular way, which is expressed by the integration axiom of GFO: For every material object Obj there exists a process Proc(Obj) such that the snapshots of Obj coincide with process boundaries of Proc(Obj). Proc(Obj) is a minimal process associated with Obj, because it exhibits at the process boundaries only such properties that are genuine properties of the object. Genuine properties have a presentic nature and are independent of any process. A process boundary of Proc(Obj) contains a snapshot of the object Obj. We can extend the minimal process Proc(Obj) by adding further properties (that depend on the process) to the process boundaries, e.g. the velocity of a moving object. 3.1.2. Situations An object-situation (simply called situation in this work) is composed of objects that are connected by relators. A situation is framed by a temporal interval and a space region. The material objects contained in a situation Sit define the skeleton of Sit. There is a certain freedom to specify the time frame and the space frame of a situation. For every situation, we may consider snapshots called presentic situations (PSit). An object- situation exhibits a presentic situation at any time-point of its time frame. We generally assume that a presentic entity is a dependent entity. It is either a snapshot of an object (or an object situation), or a part of the time-boundary of a process. 3.1.3 A selection of axioms We select some axioms as examples, [20] presents a complete system. Axioms are formulated based on signatures providing symbols, predicates and relations. We now fix a signature Σ(0)5 = {Chr(x), Obj(x), Proc(x), Pres(x), SReg(x), TimeExt(x), exhib(x, y, z), lifetime(x, y), occ(x, y), procbd(x, y, z), tempext(x, y), temprestr(x, y, z), tp(x, y), where: Chr(x):= x is chronoid; Obj(x) := x is an object (in the sense of a material 5 Σ(0) is a minimal signature for the material ontological region according to GFO and must be extended in various directions. We consider biology as belonging to the material stratum of reality. object, being a continuant); Proc(x) := x is process; Pres(x) := is a presential; SReg(x) := x is a space region; TimeExt(x) := x is temporally extended entity; exhib(x, y, z) := the material object x exhibits the entity y at time-point t; lifetime(x, y) := x is the lifetime of the object y; occ(x, y) := x occupies space region y; procbd(x, y) := x is a process boundary of the process y; procbd(x, t, y) := x is a process and y is the process boundary of x at time-point t; tempext(x, y) := x is the temporal extension of the process y; temprestr(x, y, z) := x is the temporal restriction of the process y to the time-interval z, being a temporal part of the temporal extension of y; tp(t, x) := t is time-point of the interval x. We select some axioms.6 ∀x (TimeExt(x) ↔ Obj(x) ∨ Proc(x)) (1) ∀x (Obj(x) → ∃y (occ(x, y) ∧ SReg(y)) (2) ∀x (Obj(x) → ∃y (lifetime(y, x)) (3) ∀x (Lifetime(x) → Chron(x)) (4) ∀x y t (Obj(x) ∧ lifetime(y, x) ∧ tp(t, y) → ∃z (Pres(z) ∧ exhib(x, z, t)) (5) ∀x (Proc(x) → ∃y (tempext(y, x)) (6) ∀x y (procbd(x, y) ↔Proc(y) ∧ ∃t s (tp(t, s) ∧ tempext(s, y) ∧ temprestr(x, y, t)) (7) We define the process boundary of a process at time point t (being an element of the temporal extension of the process) by the restriction of this process to this time-point.) We introduce the following integration law7. For every material object Obj there exists a process Proc(Obj) such that the snapshots of Obj coincide with the process boundaries of Proc(Obj)). This process exhibits at its boundaries only genuine properties (attributives, i.e. they have a presentic nature and are independent of any process: ∀x (Obj(x) → ∃y (Proc(y) ∧ ∀z t u (exhib(x, t, u) ↔ procbd(y, t, u))) (8) There is a difference between snapshots of objects and process-boundaries: snapshots are taken from objects, never from processes. Presentials have two sources: they can be snapshots of objects (in this case we say that an object Obj exhibits a presential at a time-point of its lifetime), or can be contained in boundaries of processes. There are cases when a process boundary is the same as the snapshot of an object participating in this process. In general, the process boundary contains more properties than the process associated with the object. If an object Obj participates in a process P then Proc(Obj) is a minimal process layer of P [21]. We say that an object Obj participates in a process P if any snapshot of Obj is contained within a process boundary of P. We introduce the following relation. partic(x, y) := x is an object, y is a process, and x participates in y. 6 Establishing a fully developed system of axioms is a research topic of its own. Here we present only a selection of particularly important axioms. 7 The integration law is a unique condition distinguishing GFO from other current top-level ontologies. ∀x y (partic(x, y) → Obj(x) ∧ Proc(y) ∧ ∀z ( snapshot(z, x) → ∃u (procbd(u, y) ∧ part_of(z, u)) (9) snapshot(x, y) → Obj(y) ∧ ∃t (exhib(y, t, x) ∧ tp(t, Lifetime(y)) (10) 3.2. Cell biology level Cells are considered living entities in contrast to inanimate entities such as stones. However, there is no clear consensus on how to define the boundary between the animate and inanimate. Typical defining properties of life are, among others, metabolism, adaptivity and interaction with the environment, self-organisation, reproduction, heredity, and growth. These conditions define a system which must satisfy at least the following basic properties. It should have a boundary, demarcating the system from the environment, and it should have inner parts. It should further be able to sense and interact with the environment (cf. Autopoiesis as an attempt to define living matter using concepts from general systems theory such as self-organisation). In biology, the cell is the simplest system satisfying these assumptions. It is an open problem whether these conditions - though necessary for the definition of life – are also sufficient for determining the essence of the animate. A minority of the biologists believed that an additional life force is needed to achieve a complete picture of the world [22]. The self-organised development of a cellular genealogy, starting from a zygote, seems to be an essential feature of the animate. Hence, the ontology of biology should consider the existence of cellular genealogies as one of the basic features demarcating biology from other fields of natural science, as physics or chemistry. Thus, we include relevant concepts of cellular genealogies, such as Cell(x), Coll(x) cell collective, cell division, cell death, cell situation Sit(x) and the corresponding processes in the basic notions of life. The formalization of these notions use the signature Σ(1) = {Cell(x), Coll(x), CollGen(x), Sit(x), PSit(x), Dead(x), id(x, y), div(x, y, z), invar(x, y), member_of(x, y), daughter_of(x, y), invar(x, y)}, where: Cell(x) := x is a cell; Coll(x) := x is a cell collective; and CollGen(x) := x is a cell-collective genealogy. A cell-collective has members, member_of(x, y) := the cell x is member of the collective y. Its invariance interval determines the lifetime of a collective: inv(x, y) := x is the invariance interval of the collective y. We distinguish two kinds of Time-Entities: Time Points and Time Intervals, where a time point is an element of a time interval. We use two types of time-intervals, those which are closed (they have a first point and a last point), and such which are left-closed and right open (i.e. they have a first point, but no last point). Notable examples are cell-division, cell-death and the various structural and morphological properties of cells. Subsequently, we present a selection of axioms. 8 These axioms can be easily transformed in pure first-order formulas, as exemplified by axiom 14.9 ∀x (Cell(x) → Obj(x)) (11) 8 A more complete axiomatization of cellular genealogies is work in progress. 9 let x = [a,b), point(u,x) := u is a point of the interval x x has a first time-point := ∃ v (point(v,x) ∧ ∀ w (point(w,x) → v ≤ w)) x has no last time-point := not (∃ v (point(v,x) ∧ ∀ u (point(u,x) → u ≤ v)). ∀x y z (div(x, y, z) → Cell(x) ∧ Cell(y) ∧ Cell(z) ∧ y ≠ z ∧ x ≠ y ∧ x ≠ z) (12) Invar(x) ↔ ∃y (Coll(y) ∧ invar(x, y)) (13) ∀x (Invar(x) → x has first time-point ∧ x has no last time-point) (14) ∀x (Coll(x) → ∃y (invar(y, x)) (15) ∀x y z (Coll(x) ∧ member_of(y, x) ∧ lifetime(y, z) ∧ invar(u, x) → part _of(u, z) (16) A cell situation, Sit, contains a cell collective forming the skeleton of the situation, and various relations between cells, called the situation’s signature. As an example of a signature consider Σ = (contact(x, y), between(x, y, z), equidistance(x, y, u, v)). ∀x ( Sit(x) → ∃y (Coll(y) ∧ ∀z (member_of(z, y) ↔ obj_in(z, x))) (17) For every situation S there exist a cell-collective C such that the members of C are exactly the objects in S. Here we assume that the situations are spanned by the cells of a collective. ∀x (Sit(x) → ∃y (Coll(y) ∧ ∀z (member_of(z, x) ↔ obj_in(z, y)) (18) Since the cells of a situation can move during the situation’s time-frame the relations between them may depend on time, e.g. two cells c, d are in contact at time point t, and separated at another time-point t’. Hence, the relations (e.g. contact(x,y)) must be extended by a time-argument such as contact(x, y, t). The time-frame of situation S is the invariance interval of the collective contained in S. coll_succ (x, y) denotes a successor relation such that x and y are cell collectives, and y is the successor of x. There exists exactly one cell-collective without a predecessor and exactly one cell collective without a successor. A cell-collective genealogy CollGen is a temporally extended structure consisting of a sequence of invariance intervals and the cell- collectives associated with these intervals: CollGen = (Coll1, ..., Collm, inv1, ..., invm, coll_succ(x, y), id(x, y), div(x, y, z), Dead(x)). A sequence of intervals specifies such a cell genealogy Int(CGen) =(inv1, ..., invm), by the collectives Collm, with added coll- successor relation, and (at least) two relations id(x, y), and div(x, y, z) between the cells of a collective, and the cells of the successor collective. By adding relations to the cell- collectives of a genealogy, we define the notion of a cell-situation genealogy, denoted by SitGen. A situation genealogy is said to be stable if the signature is the same for any situation of the genealogy. A many-sorted model-structure of a cell-collective genealogy can be specified as follows: CollGen = ((L,Inv1,...,Invn, <), Cell, Coll1, ..., Colln, lifetime(x, y), cell_succ(x, y), id(x, y), div(x, y, z)). Here, Cell(x) is a predicate, the extension10 of which contains all cells occurring during the full temporal extension of the genealogy. The extension of Coll(i) are subsets of the extension of the predicate Cell(x). (L, <) is a dense linear ordering, presenting the set of time-points, and Invi are left-closed and right-open intervals of (L, <). 10 The extension of a predicate P(x) is the set of all entities satisfying this predicate. This notion can be explicated based on a model-structure established according to the methods of logic and model theory [23]. By adding further relations, presented formally by a signature Σ = (r(1), ..., r(n)), we get a model-structure for a situation genealogy SitGen: SitGen = (CollGen, int(Σ)). int(Σ) is the interpretation of the relational symbols of Σ in the corresponding cell- collectives Colli11. 3.2.1. Description of the relations We introduce the following relations: lt(x) := lifetime of the cell x and is defined by the following condition lt(x) = y ⟷ lifetime(x, y); daughter(x, y) := ∃z (div(x, y, z); Invi(x) := x is an element of the i-th invariance interval; Initi(x) := x is the initial time-point of the interval Invari; Init(x) := ∨{ Initi(x) | i ≤ n}; init(x, y) := x is a cell and y is the initial time point of the cell’s lifetime; Colli(x) := x is an element of the i-th cell collective. The definition of succ(x, y) uses the following formulas: φi(x, y) := Colli(x) ∧ Colli+1(y) ∧ (id(x, y) ∨ (daughter(x, y)); then, cell_succ(x, y) := ∨{φi(x, y) | 0 ≤ i ≤ n -1}. 3.2.2. Selection of axioms: (L, <) is a dense linear ordering. Invi, i = 1,...,n, are intervals, such that the following conditions are satisfied: ∀ x (L(x) ↔ Inv1(x) ∨…∨ Invn(x)) (19) Φi = ∃x y (∀u ( Invi(u) ⟷ x ≤ u ≺ y), i =1,..., n (20) ⋀ { ∼∃ x (Invi (x) ∧ Invj)(x) | i ≠ j } (21) ∀ x y (Invi(x) ∧ Invi+1(y) → x ≺ y), i =1, 2, ..., n - 1 (22) ∀x (Colli(x) → Cell(x)) (23) ∀x (Colli(x) → Invi ⊆ lt(x)) (24) ∀x y (cell_succ(x, y) → daughter(x,y) ∨ id(x, y)) (25) Colli(x) ∧ div(x, y, z) → ∃u ( Init(u) ∧ init(y, u) ∧ init(z, u)) (26) A cell situation genealogy SitGen is based on a cell-collective genealogy CollGen extended by adding relations to any of the cell-collectives. 4. The experimental framework and its Formalisation 4.1. Basic conditions - Ontology of Frame-Sequences In this section, we investigate and analyse cell tracking experiments based on the principle of time-lapse microscopy. In reality (in vivo as well as in vitro) cells are moving 11 If r(x, y, z) is a ternary relation symbol in Σ, then an interpretation of r, denoted by int(r), in Colli = {a(1), ...., a(n)} is a subset of Colli ⨯ Colli ⨯ Colli (e.g. the relation between(a, b, c)). and changing continuously in time and space. Hence, the time-points are densely ordered: after a time-point, there is no direct successor. In the considered experiments, discrete snapshots of the continuous dynamics are taken. These snapshots provide incomplete information about an individual situation genealogy of the independent reality. Let a given situation genealogy SitGen be specified by the structure SitGen = ((L, Inv1, ..., Invn, <), Sit1,..., Sitn, lt(x,y), cell_succ(x, y), id(x, y), div(x, y, z), int(Σ)). The time-points at which the snapshots are taken are from a finite subset S ⊆ L of the linear ordering (L, <), hence (S,<) is a finite linear ordering which can be ordered by natural numbers. A snapshot at time point t yields a presentic situation PSit(t), which is called the frame at t, denoted by Fr(t). Any experiment Exp of this type results in a finite sequence Seq(SitGen)) = (Fr(t(1)),..., Fr(t(n)) of frames, called components of the sequence. This sequence, related to an experiment Exp, is denoted by Seq(Exp). We say that a time-lapse experiment Exp is adequate for the situation genealogy SitGen if for any situation Sit in SitGen there exists a snapshot of Sit in Seq(Exp).12 These sequences are the entities to be investigated. Any of the pictures Fr(k) reflects a snapshot of a situation from SitGen. For the sake of simplicity, we identify the frame Fr(i), being a picture, with the snapshot of the reflected situation. In the following, we fix a sequence Seq(Exp) as a result of a certain experiment. Every frame is a snapshot of a situation. Hence a frame is a presentic situation (PSit). Further, any presentic situation in FrSeq contains presentic cells, also called presentials. Presentials possess various properties and can relate to other entities. Some properties are inherent to the objects, e.g. the form (based on metrics) or the number of proteins of a certain type. Others are external to the cells, such as the distance between two cells, and the position of the cell in space. Further important relations between two presentic cells are: contact(x, y) := the cells I and y are in contact; or relative spatial positions between the cells x and y, for example, the cell y is right to the cell x, y is left to x, above, below or spatial relations with respect to an (often anatomical) frame of reference (e.g. dorsal, ventral, distal etc…). Also, spatial relations with more than two arguments are possible, e.g. between(x, y, z) the cell y is localised between the cell x and the cell z. A further example of a relation with four arguments is equidist(x, y, z, u) meaning that the distance between x and y is the same as that between z and u. Although this set of spatial relations may seem quite limited, from a biology point of view, it is in itself already useful to describe a large class of symmetries that are established or broken [24] during development (e.g. mirror symmetry in bilateral animals). Further, from a theoretical perspective, the relations of betweenness and equidistance are even sufficient to establish the whole elementary planar Euclidean geometry [25]. We emphasise that in a single frame, only presentic properties can be identified, that are independent of any process. Aspects such as the circularity of a cell path or the morphodynamics of a cell (its particular pattern of shape changes) cannot be detected in any one individual frame and are thus not presentic, but processual properties. The derivation of a processual property of a cell or a cell-collective can only be achieved by an analysis of a sequence of frames. In essence, a sequence of frames can be transformed into a video providing processual properties. Since the famous works of the photographer Eadweard Muybridge to study motion from a sequence of static pictures, the method of changing time scales via slow-motion or time-lapse/fast motion provided 12 This condition implies that the temporal distances between the snapshots are sufficiently small to acquire the relevant information about a cell-division. many insights in the processual properties of nature, and in particular into the properties of embryogenesis [8,9]. 4.2 Formal Axiomatisation of frame sequences 4.2.1 Some predicates and informal description of axioms. FSeq(x) denotes a frame-sequence x, and its components are called frames. Every frame is a snapshot of a situation, denoted by PSit. We introduce a linear ordering between the components of a frame-sequence, hence such a sequence can be presented by the structure FSeq = ({F(1), …, F(n)}, <), where F(1) < … < F(n). Let Seq be a frame sequence; we say that a component G of Seq is a successor of the component F of Seq, if F < G and there is no component between F and G. In this case we introduce the relation seq_succ(F,G), G is a sequence-successor of F. We assume that in any frame there occur cells, that these cells are presentials, and any such presentic cell is a snapshot of a uniquely determined cell (with lifetime> 0). 4.2.2 Selection of formal axioms We first introduce a signature Σ(2) on which the axioms are based: Fr(x) := x is a frame, FSeq(x) := x is a frame sequence, PCell(x):= x is a presentic cell, PSit(x) := x is a presentic situation, comp(x,y) := x is a component of the frame-sequence y, < is the linear ordering between the components of frame sequence, ipart(x,y) := x is an image-part of the frame y. ∀x y (FSeq(x) ∧ comp(y, x) → ∃z (Sit(z) ∧ snapshot (y, z)) (27) ∀x y (comp(y, x) → Fr(x) ∧ FSeq(x)) (28) ∀x y (FSeq(x) ∧ comp(y, x) → ∃z (PCell(z) ∧ ipart(z, y))13 (29) ∀x (FSeq(x) → ∃y z (comp(y, x) ∧ comp(z, x) ∧ ∼ ∃u (comp(u, x) ∧ u < y) ) ∧ ∼∃v (comp(v, x) ∧ y < v)) (30) ∀x (FSeq(x) → ∃y (SitGen(y) ∧ ∀u (comp(u, x) → ∃v (sit_of (v, y) ∧ snapshot(u, v)). (31) Axiom (31) establishes a link between the experiment and the independent reality of situational genealogies. Such an axiom should be postulated for any type of experiment as each experiment is directed at objects to be studied. We have established a relation between cellular genealogies and sequences of frames from time-lapse experiments. The final reconstruction of genealogies is then an information artefact that captures relevant knowledge about the real-word genealogies. 13 There is an ambiguity between part of a frame and image-part of a frame. For sake of simplification we do not distinguish between the image of an entity and the entity itself. We could simply say that a presentic cell is a part of a frame. Though, a frame can possess image parts as artefact to which no real entity corresponds. 5. Conclusion and Future Research In this work, we outlined an ontological foundation of cellular genealogies concerning a fundamental theory and a formal representation of a type of experiments and its results. The full framework will provide three levels of abstraction. This paper addresses the first two levels: the theory and the experiment level. At the theory level, we analysed cellular genealogies as independent real-world entities using the onto-axiomatic method. We proposed a partial formal axiomatisation of knowledge assumed to be true for every cellular genealogy. At the experiment level, we formally described time-lapse experiments and developed an axiomatic foundation of this domain. Any experimental framework should be considered as a mediator between a theory and the real-world entities to be studied. Experiments provide data about a domain of interest; they play an indispensable role for supporting or disproving a theory, and thus for further development and revisions of theories. Our development of the overarching conceptual framework follows the onto-axiomatic method adhering to the principles of model- theory [23,26] as introduced in biology by [27]. As a conceptual next step, we will extend the genealogy types outlined here, to model the self-organising processes in biology as complex, interacting systems (embryonic development being a prime example). Building on existing work on collective phenomena by [15], we will consider groups of cells and their mutual interactions. We could model groups of interacting cells14 as object-situations, e.g. a cell- group can be a specific tissue (or a group of precursor cells). We may further introduce material boundaries and dynamics of these cell-groups to build a cell-group genealogy. A critical problem will be to find appropriate levels of granularity. Here, we will build on ideas from complex systems research. Finally, we outline directions for future research, as we feel that the presented framework paves the way for new questions and might even open new fields: Development of suitable representational levels. We presented theories on a general level in this paper. However, the instance-level is still needed, if we want to study individual cellular genealogies. We are currently investigating various representational levels as continuation of the present paper. Extending the ontological foundation of cellular genealogies. To elaborate on the presented framework, we will analyse existing knowledge in developmental biology and successively transform it into formalised axioms based on the onto-axiomatic method. Elaborating Genealogy-Theories for particular model species. An ideal first step would be the development of a complete genealogy-theory for the model organism Caenorhabditis elegans as much is known about its genetics and development [2,28]. Extending the outlined theory to other levels of granularity. Our current genealogy- theory refers to the single-cell level as a ‘middle-out’ starting point as already proposed by [29]15. We will consider two canonical extensions of granularity levels: We explicitly model the state of cells at the molecular level using [30,31] and we model cell-groups as tissue-level entities. 14 Such as the parasegments forming during patterning of Drosophila embryos [16]. 15 Sidney Brenner is being credited with saying ‘I believe very strongly that the fundamental unit, the correct level of abstraction, is the cell and not the genome’ by [29]. Modelling cellular genealogies in disease. To support computational approaches in systems medicine, we should elaborate a specific theory for abnormal genealogical patterns as can be found in certain cancers, such as leukaemia [32] and related diseases. References [1] J.B. Wallingford, The 200-year effort to see the embryo, Science. 365 (2019) 758–759. [2] J.E. Sulston, E. Schierenberg, J.G. White, and J.N. Thomson, The embryonic cell lineage of the nematode Caenorhabditis elegans, Dev. Biol. 100 (1983) 64–119. [3] S.G. Megason, and S.E. Fraser, Imaging in systems biology, Cell. 130 (2007) 784–795. [4] J. Huisken, J. Swoger, F. Del Bene, J. Wittbrodt, and E.H.K. Stelzer, Optical sectioning deep inside live embryos by selective plane illumination microscopy, Science. 305 (2004) 1007–1009. [5] L.A. Royer, W.C. Lemon, R.K. Chhetri, Y. Wan, M. Coleman, E.W. Myers, and P.J. Keller, Adaptive light-sheet microscopy for long-term, high-resolution imaging in living organisms, Nat. Biotechnol. (2016). doi:10.1038/nbt.3708. [6] K. McDole, L. Guignard, F. Amat, A. Berger, G. Malandain, L.A. Royer, S.C. Turaga, K. Branson, and P.J. Keller, In Toto Imaging and Reconstruction of Post-Implantation Mouse Development at the Single- Cell Level, Cell. 0 (2018). doi:10.1016/j.cell.2018.09.031. [7] R.M. Harland, A new view of embryo development and regeneration, Science. 360 (2018) 967–968. [8] J. Wellmann, Model and movement: studying cell movement in early morphogenesis, 1900 to the present, Hist. Philos. Life Sci. 40 (2018) 59. [9] J. Wellmann, Die Form des Werdens: eine Kulturgeschichte der Embryologie; 1760-1830, Wallstein, 2010. [10] R.M. Power, and J. Huisken, Putting advanced microscopy in the hands of biologists, Nat. Methods. (2019). doi:10.1038/s41592-019-0618-1. [11] A.N. Gonzalez-Beltran, P. Masuzzo, C. Ampe, G.-J. Bakker, S. Besson, R.H. Eibl, P. Friedl, M. Gunzer, M. Kittisopikul, S.E. Le Dévédec, S. Leo, J. Moore, Y. Paran, J. Prilusky, P. Rocca-Serra, P. Roudot, M. Schuster, G. Sergeant, S. Strömblad, J.R. Swedlow, M. van Erp, M. Van Troys, A. Zaritsky, S.-A. Sansone, and L. Martens, Community Standards for Open Cell Migration Data, BioRxiv. (2019) 803064. doi:10.1101/803064. [12] S. Leonelli, The challenges of big data biology, Elife. 8 (2019). doi:10.7554/eLife.47381. [13] I. Glauche, R. Lorenz, D. Hasenclever, and I. Roeder, A novel view on stem cell development: analysing the shape of cellular genealogies, Cell Prolif. 42 (2009) 248–263. [14] P. Burek, N. Scherf, and H. Herre, A pattern-based approach to a cell tracking ontology, Procedia Comput. Sci. 159 (2019) 784–793. [15] Z. Wood, and A. Galton, A taxonomy of collective phenomena, Appl. Ontol. 4 (2009) 267–292. [16] L. Wolpert, and C. Tickle, Principles of Development, OUP Oxford, 2011. [17] P. Burek, N. Scherf, and H. Herre, Ontology patterns for the representation of quality changes of cells in time, J. Biomed. Semantics. 10 (2019) 16. [18] R. Baumann, F. Loebe, and H. Herre, Axiomatic theories of the ontology of time in GFO, Appl. Ontol. 9 (2014) 171–215. [19] D. Hilbert, Axiomatisches Denken, in: D. Hilbert (Ed.), Dritter Band: Analysis · Grundlagen Der Mathematik · Physik Verschiedenes: Nebst Einer Lebensgeschichte, Springer Berlin Heidelberg, Berlin, Heidelberg, 1935: pp. 146–156. [20] H. Herre, General Formal Ontology (GFO): A Foundational Ontology for Conceptual Modelling, in: R. Poli, M. Healy, and A. Kameas (Eds.), Theory and Applications of Ontology: Computer Applications, Springer Netherlands, Dordrecht, 2010: pp. 297–345. [21] H. Herre, B. Heller, P. Burek, R. Hoehndorf, F. Loebe, and H. Michalek, General Formal Ontology (GFO): A Foundational Ontology Integrating Objects and Processes. Part I: Basic Principles (Version 1.0), Research Group Ontologies in Medicine (Onto-Med), University of Leipzig, 2006. [22] H. Driesch, Philosophie des organischen; Gifford-vorlesungen gehalten an der Universität Aberdeen in den jahren 1907-1908, (1921). https://www.worldcat.org/title/philosophie-des-organischen-gifford- vorlesungen-gehalten-an-der-universitat-aberdeen-in-den-jahren-1907-1908/oclc/3408806. [23] W. Hodges, School of Mathematical Sciences Wilfrid Hodges, and H. Wilfrid, Model Theory, Cambridge University Press, 1993. [24] A.C. Neville, Animal asymmetry The Institute of Biology’s Studies in Biology, London, UK: Edward Arnold. (1976). [25] A. Tarski, What is elementary geometry?, in: Studies in Logic and the Foundations of Mathematics, Elsevier, 1959: pp. 16–29. [26] C.C. Chang, and H.J. Keisler, Model Theory, Elsevier, 1990. [27] J.H. Woodger, and W.F. Floyd, The Axiomatic Method in Biology, By J.H. Woodger. With Appendices by Alfred Tarski and W.F. Floyd, Cambridge University Press, 1937. [28] S. Brenner, Nature’s gift to science (Nobel lecture), Chembiochem. 4 (2003) 683–687. [29] D. Noble, The music of life: biology beyond the genome, Oxford: Oxford University Press, 2006. [30] J. Bard, S.Y. Rhee, and M. Ashburner, An ontology for cell types, Genome Biol. 6 (2005) R21. [31] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock, Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet. 25 (2000) 25–29. [32] C. Bahr, L. von Paleske, V.V. Uslu, S. Remeseiro, N. Takayama, S.W. Ng, A. Murison, K. Langenfeld, M. Petretich, R. Scognamiglio, P. Zeisberger, A.S. Benk, I. Amit, P.W. Zandstra, M. Lupien, J.E. Dick, A. Trumpp, and F. Spitz, A Myc enhancer cluster regulates normal and leukaemic haematopoietic stem cell hierarchies, Nature. 553 (2018) 515–520.