=Paper= {{Paper |id=Vol-3618/forum_paper_19 |storemode=property |title=Parallel lives diagrams for co-evolving communities and their application to schema evolution |pdfUrl=https://ceur-ws.org/Vol-3618/forum_paper_19.pdf |volume=Vol-3618 |authors=Fanis Giachos,Nikos Pantelidis,Christos Batsilas,Apostolos V. Zarras,Panos Vassiliadis |dblpUrl=https://dblp.org/rec/conf/er/GiachosPBZV23 }} ==Parallel lives diagrams for co-evolving communities and their application to schema evolution== https://ceur-ws.org/Vol-3618/forum_paper_19.pdf
                                Parallel lives diagrams for co-evolving communities
                                and their application to schema evolution⋆
                                Fanis Giachos1 , Nikos Pantelidis2 , Christos Batsilas3 , Apostolos V. Zarras4 and
                                Panos Vassiliadis4
                                1
                                  Piraeus Bank, Athens, Greece
                                2
                                  CGI Nederland, Rotterdam, Netherlands
                                3
                                  Natech S.A., Ioannina, Greece
                                4
                                  University of Ioannina, Ioannina, Greece


                                                                         Abstract
                                                                         In this paper, we address the problem of modeling co-evolving peers in communities over time. Our
                                                                         motivation comes from the area of software and schema evolution; however, we generalize our modeling
                                                                         to cover communities of peer entities in general, evolving over discrete time beats, with quantifiable mea-
                                                                         surements of behavior. Furthermore, we demonstrate how our modeling can facilitate the visualization,
                                                                         comprehension, and automated analysis of the lives of such communities.

                                                                         Keywords
                                                                         Evolving communities, Software Evolution, Schema Evolution, Software Visualization




                                1. Introduction
                                Software systems are never complete or perfect; hence they continuously evolve, in order to
                                accommodate new requirements, adapt to changing operational environments, as well as to
                                correct internal problems and errors, either prior, or after they are discovered at their usage.
                                The study of software evolution has two aspects, as [1] eloquently states: (a) the what and
                                why of software evolution, that "focuses on the properties of the phenomenon, its causes and
                                identification of the drivers underlying development and maintenance activity", and, (b) the
                                how, that is "the methods, tools and technology to facilitate disciplined and efficient software
                                change". Understanding laws and patterns that guide software evolution allows us to recognize
                                mechanisms, tendencies, and (ideally) deterministic behaviors of how software systems change.
                                  A specific aspect we are addressing in this paper, concerns the understanding of how different
                                parts of a software system evolve together. The parts of a software system behave as peers
                                that co-exist in a community, where all the components must collaborate towards providing
                                the necessary functionality. In particular, we are motivated by the study of schema evolution,

                                ER2023: Companion Proceedings of the 42nd International Conference on Conceptual Modeling: ER Forum, 7th SCME,
                                Project Exhibitions, Posters and Demos, and Doctoral Consortium, November 06-09, 2023, Lisbon, Portugal
                                ⋆
                                  The work of all authors has been conducted during their time in the Univ. of Ioannina
                                $ fgiahos@hotmail.gr (F. Giachos); pantelidis.nikos@outlook.com (N. Pantelidis); christosbats@gmail.com
                                (C. Batsilas); firstname.lastname@cs.uoi.gr (A. V. Zarras); firstname.lastname@cs.uoi.gr (P. Vassiliadis)
                                 0000-0001-9521-5853 (A. V. Zarras); 0000-0003-0085-6776 (P. Vassiliadis)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
where the components of a relational schema evolve over time to accommodate changing
information needs by the surrounding applications. Overall, the research question that drives
us is: can we model, and trace, the information necessary to allow us to study the joint co-evolution
of different parts of a software system (and, in particular, a relational schema) in order to be able
(a) to understand how the different parts of a system co-evolve over time, (b) identify highlights
and patterns over this co-evolution, and, (c) be able to come up with automated discoveries and
reporting of significant findings over the studied histories?
   Example. To explicate our stance on the problem, we motivate the discussion with an
example. In Figure 1 we depict a visual representation of the parallel lives of the relations of a
relational database schema, which form a community of co-evolving entities.
   Time is represented as a timeline of discrete time-beats (the columns of the visual representa-
tion). The different entities of the community (in the case of schema evolution: the relations of
the schema) are visually represented as the rows of a two-dimensional matrix. Each entity has
(a) a dedicated row where its life is visually depicted, and, (b) several aggregate details (depicted
as a pop-up window) at the bottom of the figure. The most fundamental properties for an entity
are (a) the timepoint when it joins the community (to which we typically refer to as "birth"), (b)
a possible timepoint when it leaves the community (referred to as "death" – in our case, the table
is removed from the schema), and, (c) the amount of change that takes place at each time point,
along one or more quantifiable measurements (visually depicted via the color saturation in the
diagram of Figure 1). The entities in this particular representation have been sorted according
to their birth, to facilitate a visual understanding of the progression of events (other types of
sorting, with different visual goals, are also possible – e.g., along the lines of the amount of
change they have undergone). The core of the model is a two-dimensional matrix of 𝐸𝑛𝑡𝑖𝑡𝑖𝑒𝑠
× 𝑇 𝑖𝑚𝑒𝐵𝑒𝑎𝑡𝑠 that concentrates the information needed for representing, visualizing, studying
and analyzing the history of the community.
   In summary, in Figure 1, the usage of the aforementioned modeling along a timeline, a set of
peer entities, and quantifiable measures of activity for each of them, is vividly demonstrated. The
most obvious usage of the model is the visualization part, which allows a quick understanding
(and reporting) of how the life of the community has evolved. Apart from facilitating reporting
and understanding, the modeling allows the fully automated identification of highlights, or
patterns, over the two-dimensional matrix: massive births, massive updates, progressive ex-
pansion, entities with continuous change, or entities without any change whatsoever, can be
automatically discovered, reported and visually highlighted on the basis of our model.
   Generalization. We have worked with the study of schema evolution histories at very
large numbers. However, we claim that our results are generalizable to larger settings, beyond
schema evolution. For a community to abide by our model one needs to have (a) a notion of
time, in a timeline of discrete time steps; (b) a set of discrete entities that form a community; (c)
the notion of entities joining and leaving the community (birth and death in our terminology)
during the monitored timeline; (d) measurable quantities for the entities of the community that
are measured throughout their participation in the community. Whenever the aforementioned
properties hold, our framework covers the evolution of such a community and can provide the
necessary modeling, visualization and analysis means to the analyst who wishes to understand
how the community evolves. The components of a software project (be it packages, classes,
modules, libraries or other software entities) are a clear case where our framework is directly
Figure 1: An annotated Parallel Lives Diagram, depicting the timeline, the entities, the changes,
important events and details for the lives of the entities of a community – in this case, the relations of
the schema of a database


applicable.
   Contributions and Roadmap. After starting with the presentation of related work in
Section 2, the paper proceeds to provide the following contributions. In Section 3, we present
the conceptual model of our approach. We require the collaboration of Entities, Timelines,
Measurement Types and Parallel Lives Matrices for the representation of the necessary infor-
mation required to characterize how the peers of a community co-evolve. In Section 4, we
present the definitions, examples and algorithms for the mining of interesting patterns from the
co-evolution of the entities of a community. In Section 5, we present how the application of our
modeling to the case of schema evolution, revealed interesting patterns of change over a data
set of 195 schema histories of Free-Open Source projects. Finally, we conclude our deliberations
with a discussion of the contributions of this paper as well as open roads for future work.


2. Background
To the best of our efforts, we were unable to find any works on modeling and visualizing
co-evolving communities, in general. However, there are several works pertaining to software
and database evolution that are clearly the motivation for our work – although we argue that
our modeling can be generalized to a broader set of contexts.
  Software Evolution. Software evolution has been studied for decades at several levels:
software architecture [2], design [3] and implementation [4]. The main driver for studying
software evolution have been Lehman’s laws and the theory that accompanies them, starting
in the mid ’70’s all the way to nowadays. For a discussion of Lehman’s laws on can refer to
[5, 6, 7, 8, 9], summarized in [1]. Other attempts towards finding regularities and patterns in
software evolution include [10, 11, 12, 13, 14, 15, 16, 17, 18, 17, 19, 20]. Although not all laws
are considered valid any more, the idea for searching in patterns on how software evolves is
fundamental in the research on software evolution.
   Schema Evolution. Schema evolution, which has been the main driver of our research,
involves the progressive change of the internal structure of a database over time. Typically, the
studies in the area of schema evolution are mostly observational, assessing the qualitative and
quantitative characteristics of schema evolution – i.e., answering the question "what are the
characteristics of the phenomenon that we study?". Several studies address this question in the
field of relational databases [21], [22],[23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33],
[34], [35]. Currently, interest has passed from relational to non-relational databases [36], [37] –
see [38] for an overview.
   Schema and Source co-evolution. As the impact of schema evolution can be very large
to the ecosystem built around the database [39, 40], there are also studies concerning ways to
adapt queries whenever the schema changes [41], [42], [43], [44], [45] – see [46] for an overview.
In the meanwhile, there are also works on how schema and source code co-evolve, both on the
area of studying the joint evolution and in the area of proposing techniques to synchronize
schema and applications as they both change [21], [23], [24], [25], [47], [48], [49].
   Software visualization of co-evolution. To a large extent, the most related area of research
to our effort is software visualization, with an emphasis on tracing co-evolution. The papers in
this category are not explicitly providing a conceptual modeling perspective to their method,
but rather, they focus on the visualization aspects.
   The authors of [50] provide some first visualization techniques for detecting patterns of
change. The authors of [51] provide parallel timelines resembling violin charts, for different
parts of the code to describe important events in the history of a system. In [52], the authors
provide a system with dense evolution lines for different parts of the code, where notes can
be attached for commenting. The authors of [53] use heatmaps to demonstrate how classes
cooperate in use cases, or how much each developer contributes to the maintenance of a
system. The authors of [54] propose an Entities × Entities matrix to trace co-evolution of code
components (in contrast to our proposal that includes time, too). [55] visualizes the co-change
of code and tests via bubble-charts and [56] as addition-deletion bar-charts.
   For 3D visual representations of evolving software, quite often represented via the "city
metaphor", one can refer to [57], [58], [59], [60]. We avoid the 3D city metaphor as overly
complicated contrasted to the simple representation that we provide.


3. The model
The main concepts that define the landscape of communities of jointly evolving peers concern
(a) time, (b) peers, and, (c) measured behavior.
3.1. Basic concepts
We assume a linear discrete version of time. We consider as time, a domain of values that is
isomorphic to the non-negative integers and consists of discrete and equidistant time beats. This
is not necessarily restricted to "human" time (which, of course, is also eligible for the role). For
example, when we study evolving software repositories, time can be modeled as the commits
made by the developers to a branch of the repository; in this case, each commit is treated as a
time beat and commits are isomorphically mapped to non-negative integers.
   We also assume a community of peer entities that evolve together. The entities can be
arbitrary. In the case of schema evolution, the entities under investigation are the relations
that appeared in the entire history of a certain schema. For each entity we have (a) a time-beat
of birth where the entity joins the community (for the case of schema evolution: the table is
introduced in the schema), (b) a time-beat of death, when the entity leaves the community
(for the case of schema evolution: a table is deleted from the schema), and, in-between, (c) an
evolving behavior, characterized through a vector of measurements (for the case of schema
evolution: a vector of measurements quantifying, for each table, for each time-beat, the number
of attributes inserted into the table, ejected from it, modified with respect to their data type,
etc).
   Therefore, the main concepts appearing in the domain can be listed as follows:
    • Timeline: a linear domain 𝒯 ∞ , which is isomorphic to the non-negative integers N0
      and provides a common context (or, timeframe) for the evolution of a community of peer
      entities. For practical purposes we will work with finite histories, thus, time will be a
      finite subset of 𝒯 ∞ , 𝒯 = {𝑇0 , . . . , 𝑇𝑚 }.
    • Beat: A beat is a unique member of a time domain. Thus, it can be a time unit (second,
      day, month, etc) or anything simulating a time domain (a stock market working day, a
      commit in a software repository, etc).
    • Entity: A distinct member of a community whose life is being monitored. An entity 𝐸
      has a time point of appearance, 𝐸.𝑇 𝑎 , when it first joins the community, an optional time
      point of ultimate disappearance 𝐸.𝑇 𝑑 , when it leaves the community and at any time
      point 𝑡𝑖 it has a state (which will be presented in the sequel).
    • Community: a finite set of Entities, 𝒞 = {𝐸1 , . . . , 𝐸𝑛 } monitored together for their joint
      evolution. A set of stocks in a stock market, or the set of tables of a schema are examples
      of communities.
    • Measurement Type: A common quantity that is monitored for the lives of the entities of
      a community and evaluated through numeric measurements. The entities of a community
      can be monitored for a number of Measurement Types. For example, a table can be
      monitored for the number of attributes injected, ejected, having their data type updated,
      as well as the sum of the above as a measurement of total activity.Each of these quantities
      is a Measurement Type. We assume that the entities of a community are all monitored
      upon a common set of Measurement Types, ℳ = {𝑀1 , . . . , 𝑀𝑘 }, with each measurement
      type 𝑀𝑖 having as its domain of values 𝑑𝑜𝑚(𝑀𝑖 ), which, for simplicity, we will uniformly
      assume to be R. Every member of the domain of a Measurement Type is a Measurement.
    • TimeEntityMeasurementSet(TEM): The combination of an entity, a beat, and a vector
      of measurements - i.e., a unique point in the life of an entity, along with the measurements
      that pertain to it. Assuming we have fixed the set of measurement types into a single
      Measurement Type, we refer to a TimeEntityMeasurement object. Thus, 𝑇 𝐸𝑀 ℳ is
      a function 𝑇 𝐸𝑀 ℳ : 𝒯 × 𝒞 → ℳ and 𝑇 𝐸𝑀 𝑀 for a single measure 𝑀 is a function
      𝑇 𝐸𝑀 𝑀 : 𝒯 × 𝒞 → 𝑀 (practically projecting ℳ to 𝑀 ).

   How are all these concepts combined? We introduce the Parallel Lives Matrix (PLM)
which is a matrix having (a) all the entities of a community as rows, (b) all the beats of a time
domain for this community as columns, (c) the respective TEM objects as cells. Although we
will revisit the definition in the sequel, for the moment we can point to Figure 2 that depicts a
visual representation of a PLM with a single Measure Type, 𝑇 𝑜𝑡𝑎𝑙𝐴𝑐𝑡𝑖𝑣𝑖𝑡𝑦.




Figure 2: A visual depiction of the Time Entity Measurements for a community of peers (the tables of a
database schema) for a time line of commits in a public repository, over a single measure: total change
activity.



3.2. Groups of time beats and entities
As already mentioned, one of the main purposes for working with this representation of the
lives of peers in a community is the possibility of visualizing the entire history of the community
in a two-dimensional surface, like a screen. The two-dimensional visual representation of a
Parallel Lives Matrix as a table of columns and rows is a straightforward, simple and intuitive
solution. However, throughout the construction and subsequent usage of a system to perform
this visualization, we repeatedly came across a problem, which seems to be fundamental in
handling long histories and large communities: Time and again, either the time-line, or the
community size was too big for a screen to accommodate. A generic requirement, thus, occurred,
to be able to group entities into homogeneous groups and time beats into homogeneous phases,
in order to reduce the visual footprint of the matrix. The following concepts, are therefore
added to our conceptual model for the handling of the evolving lives of peer communities.
    • Phase: A phase 𝑃 is a list of consecutive time beats 𝑃 ={𝑇𝑠𝑡𝑎𝑟𝑡 . . . 𝑇𝑒𝑛𝑑 } in the same
      domain. A time beat is also a trivial case of a phase. Practically, phases allow us to zoom
      out time in coarser time granules and e.g., group beats in months instead of individual
      days, in order to make the visualization fit in the limited area of a screen. A Phased
      Timeline 𝒫 𝒯 , or simply 𝒫, over a simple Timeline 𝒯 is a list of phases 𝒫 = {𝑃1 , . . . , 𝑃𝜏 }
      that introduces a partition over 𝒯 , i.e., 𝒯 is fully covered by 𝒫, and all members of 𝒫 are
      pairwise disjoint.
    • Entity Group: Assuming a community 𝒞, an entity group 𝐺 = {𝐸𝑥1 , . . . , 𝐸𝑥𝑘 } is a subset
      of 𝒞. Entity groups are produced by clustering entities with similar lives, to reduce the
      amount of rows in our visualization. A single entity is a trivial case of an entity group. A
      Grouped Community 𝒢 𝒞 , or simply 𝒢, is a partition of a community 𝒞 into pairwise
      disjoint and fully covering grouped communities.
    • GroupPhaseMeasurementSet (GPM): A GroupPhaseMeasurementSet is defined
      with respect to the combination of an entity group and a phase; its role is to aggregates
      the TimeEntityMeasurementSet instances pertaining to the entities of the entity group,
      and the time beats of the phase.

   Thus, assuming an aggregate function 𝛾, and a single measure type 𝑀 , 𝐺𝑃 𝑀 𝑀 is a function
𝐺𝑃 𝑀 𝑀 : 𝒫 × 𝒢 → 𝑀 𝛾 , where 𝑀 is mapped to a new Measure Type 𝑀 𝛾 , s.t., if for a given
phase 𝑃 , and a given entity group 𝐺, 𝐺𝑃 𝑀 𝑀 (𝑃, 𝐺) = 𝑣, then v is the aggregation of all 𝑚𝑖 ,
s.t., 𝑚𝑖 ∈ 𝑇 𝐸𝑀 𝑀 (𝑇𝑗 , 𝐸𝑘 ), 𝑇𝑗 ∈ 𝑃 and 𝐸𝑘 ∈ 𝐺. Then, 𝐺𝑃 𝑀 ℳ is a function 𝐺𝑃 𝑀 ℳ : 𝒫
× 𝒢 → ℳ𝛾 , where ℳ𝛾 is produced by the Cartesian Product of all 𝑀𝑖𝛾 , for all 𝑀𝑖 ∈ ℳ.
   Different techniques are applied to perform the groupings of time beats and entities. Specifi-
cally, a time-clustering algorithm splits the time domain into disjoint, consecutive phases that
fully cover the original time domain, with the goal of retaining as much uniformity in terms of
activity within each phase. Exactly along the same line, an entity-clustering algorithm splits a
set of peers into a set of disjoint clusters that fully cover the original set of entities, with the
goal of retaining as much uniformity in terms of activity within each group. For example, in
our implementation, we create phases and entity groups using agglomerative clustering over
the total activity of beats and entities respectively (with the observation that when it comes to
time, the beats of a cluster must be consecutive).
   Remember also that single beats and single entities are trivial phases and entity groups;
therefore, in the absence of clustering, TEM’s are trivial GPM’s and can be treated as such.
   Now, we can revisit the definition of a PLM, to generalize it to Phases and Entity Groups. A
Parallel Lives Matrix (PLM) which is a matrix having (a) all the entity groups of a community
as rows, (b) all the phases of a time domain for this community as columns, (c) the respective
GPM objects as cells. We will employ the notation 𝑃 𝐿𝑀 0 whenever both entity groups and
phases are trivial, and therefore the PLM concerns individual entities and time beats.
   A Parallel Lives Diagram (PLD) is a diagram that visually represents a PLM for a single
measurement type. Practically, a Parallel Lives Diagram is the visual representation for a
𝐺𝑃 𝑀 𝑀 .
   In Figure 3, we depict the basic modeling notions of our approach.
Figure 3: The model for the Parallel Lives Matrix environment


3.3. Cell states
By monitoring the community on-line, or, by studying log files post-hoc, we can have a way
to know, for each time beat, for each entity, whether this entity was a member of, or had
left, the community, and, in the case where it was a member of the community, the specific
measurements that pertain to the monitored measurement types (which we will collectively
refer to as "activity").
   Given this knowledge, concerning presence and activity, every cell in a 𝑃 𝐿𝑀 0 (equivalent:
PLD), 𝑃 𝐿𝑀 0 [𝐸, 𝑇 ] can have one of the following states (remember that an entity can leave
and rejoin many times):

    • Active: the entity group 𝐸 is alive, i.e., it has appeared in a previous beat than 𝑇 and is
      still a member of the community.
    • Absent or Inactive: the entity 𝐸 has not been created yet at time 𝑇 , or has been deleted in
      a previous beat than 𝑇 and not recreated at, or, before 𝑇 (remember there are entities
      that leave the community and later re-join).
    • Birth or Appearance: the cell is Active and this is the first cell of the row corresponding to
      𝐸 with status Active (i.e., this is the first appearance ever of the entity in the community).
    • Rebirth: the cell is not in a Birth state, however, it is Active and its previous cell in the
      same row is Absent.
    • Disappearance: the cell is Active and the next cell of the row is Absent.
    • Death: the cell is in a state of Disappearance and there is no other birth of the entity later
      – equivalently, it the last beat where the entity is active, and it is followed by a contiguous
      period of beats, spanning all the way to the final beat of time, where the cell state is
      absent.

   Aggregation. Whereas the state of non-aggregate data is straightforward to obtain, the same
does not hold for the state of aggregate cells. Assume we merge two entities 𝐸1 and 𝐸2 into
a group 𝐺, while at the same time, a list of their beats are merged into a new phase 𝑃 . Then,
we have a new cell 𝑃 𝐿𝑀 [𝐺, 𝑃 ] whose state we need to determine. Recursively, the problem
generalizes into merging a window of the PLM into a single cell.
   There are several possibilities for this decision. One possibility is to prioritize states: for
example, one might say that births are more important than alive states, which are more
important than disappearance states. This is an arbitrarily set order, for exemplary reasons –
one can allow other state rankings, depending on individual preferences. Another possibility
involves taking the state of the first / median / last cell of the first / middle / last entity of the
merged window. A majority vote of the cells is a third possibility.


4. The patterns
The power of our modeling is based on its simplicity. The model constructs are amenable to a
straightforward visualization, via a direct mapping to a two-dimensional matrix. Apart from the
obvious benefits in terms of intuition that come with simple visualizations, we have taken the
opportunity to mine for patterns over the representation. Patterns are interesting properties of
the two-dimensional representation, possibly on the basis of a single column, row, or cell – and
potentially via combinations of them – that demonstrate interesting behavior for the purpose
of understanding recurring behaviors in the lives of communities.

4.1. Example
In Figure 4, we can observe the existence of several patterns, concerning massive births deaths,
updates, and, progressive expansion. For the non-colorblind readers, the color of the cells per
pattern is also reported.

    • Observe the existence of a "Multiple Birth Stairs" pattern (a) between columns 0 and 5 and
      (b) between columns 20 and 29. In both cases, there are consecutive columns with cells of
      state Birth, and the number of these cells is higher than the threshold. The involved cells
      are painted pink.
    • In column 2, there are bulk deletions of entities (i.e., there are more than the threshold,
      with a number greater than 3), so the column supports the "Multiple Deletion" pattern.
      The involved cells are painted red.
    • Similarly, the "Multiple Updates" pattern is supported by columns 3, 10, 21, 23, 27 and 32.
      The involved cells are painted yellow.
    • Finally, in column 0 it is easy to observe the "Multiple Births" pattern. Color-wise, the
      cell’s coloring is overridden by the color of the "Multiple Birth Stairs" pattern.
Figure 4: The PLD for Biosql with the cells participating in the patterns colored.


4.2. Pattern definitions
In this section, we introduce a set of pattern families, as well as concrete patterns that belong to
them that are possibly derivable from the information on the evolution of a community. Both
the set of families and the set of patterns are extensible; in our deliberations we refer only to
the ones that we have implemented.
   As already mentioned, the tool we use for registering the state and evolution of the population
of peer entities is a two-dimensional matrix 𝐵𝑒𝑎𝑡𝑠 × 𝐸𝑛𝑡𝑖𝑡𝑖𝑒𝑠, which can easily be grouped
into a matrix 𝑃 ℎ𝑎𝑠𝑒𝑠 × 𝐸𝑛𝑡𝑖𝑡𝑦𝐺𝑟𝑜𝑢𝑝𝑠. In the rest of our deliberations, we will use the latter
as our setup of reference; however, all the characterizations and algorithms are immediately
applicable to the simple domain model, which is a trivial case of the latter setup.
   The first family of patterns that we introduce (Def. 4.1) involves patterns where each column
can be tested in isolation from the others. This can involve the existence of a cell with a certain
state ("did a birth occur in this phase?"), or, more commonly, whether the cardinality of cells
with a certain state exceeds a threshold value ("there are too many births in this phase"). The
latter sub-family of patterns is singled out also as a distinct family of interest (Def. 4.2).

Definition 4.1. Single column, local-cell-test pattern. Assume a 𝑃 𝐿𝐷[𝑛 × 𝑚], with 𝑛
rows and 𝑚 columns. A single column, local-cell-test pattern is a predicate that when applied to
a column, returns true or false on the basis of evaluating a condition on the cells of the column,
one-at-a-time, i.e., independently of the state of other cells or columns.

Definition 4.2. Single column, counting, local-cell-test pattern. A single column, counting,
local-cell-test pattern is a single column, local-cell-test pattern, where the verification of pattern
existence involves counting the number of cells of a certain state.
For a column 𝐶, and the possibility of testing the state of any of its cells, say 𝑐, independently
of other cells or columns, we can have a counting pattern test of the form:

             𝑐𝑜𝑢𝑛𝑡𝑖𝑛𝑔𝑃 𝑎𝑡𝑡𝑒𝑟𝑛𝑇 𝑒𝑠𝑡(𝐶, 𝜏 ) = {𝑐𝑜𝑢𝑛𝑡(𝑐) > 𝜏 |𝑐 ∈ 𝐶, ℎ𝑜𝑙𝑑𝑠𝑇 𝑒𝑠𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒(𝑐)}

where 𝜏 is a counting threshold. Three prominent examples of such a pattern concern:

       • 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒𝐵𝑖𝑟𝑡ℎ𝑠(𝐶, 𝜏 ), the case of multiple births in a column, with
         ℎ𝑜𝑙𝑑𝑠𝑇 𝑒𝑠𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒(𝑐): 𝑐.𝑠𝑡𝑎𝑡𝑒 == 𝑏𝑖𝑟𝑡ℎ
       • 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒𝐷𝑒𝑎𝑡ℎ𝑠(𝐶, 𝜏 ), the case of multiple deaths in a column, with
         ℎ𝑜𝑙𝑑𝑠𝑇 𝑒𝑠𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒(𝑐): 𝑐.𝑠𝑡𝑎𝑡𝑒 == 𝑑𝑒𝑎𝑡ℎ
       • 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒𝑈 𝑝𝑑𝑎𝑡𝑒𝑠(𝐶, 𝜏 ),the case of multiple updates in a column, with
         ℎ𝑜𝑙𝑑𝑠𝑇 𝑒𝑠𝑡𝑒𝑑𝑆𝑡𝑎𝑡𝑒(𝑐): 𝑐.𝑠𝑡𝑎𝑡𝑒 == 𝑎𝑐𝑡𝑖𝑣𝑒 && 𝑐.𝑢𝑝𝑑𝑎𝑡𝑒𝑀 𝑒𝑡𝑟𝑖𝑐 > 0

   Another family of patterns involves sliding a window over (a) several columns, and, (b)
several rows of the PLM ("several" can become "all" to capture holistic patterns, too) and testing
a predicate (def 4.3). A pattern that we have frequently observed in the evolution of relational
schemata involves subsequent births in contiguous columns (Def. 4.4). Observe Figure 4: the
PLD of the figure shows the life of the schema of a specific database-backed project, named
BioSQL. The rows of the PLD are sorted by birth (columns are inherently sorted as they represent
time and they are isomorphic to the natural numbers). Observe the middle bottom part of
the figure: several adjacent columns demonstrate births, one after the other. To the extent
that the PLD is sorted by birth and time, the visual impression from the respective birth cells
(highlighted in intense tonality – for the non-colorblind: in pink) is a "staircase" which is also
the name of the pattern.

Definition 4.3. Sliding Window pattern. Assume a 𝑃 𝐿𝐷[𝑛 × 𝑚], with 𝑛 rows and 𝑚
columns. A sliding window pattern is a predicate that when applied to a set of columns, returns true
or false on the basis of evaluating a condition over the entire set of cells contained in a "window"
area defined over several (possibly all of the) rows of the involved columns.
                                                                                𝐵
Definition 4.4. Strict 𝜏 𝐵 -staircase of births. Assume a 𝜏 𝐵 -sized list 𝐿𝜏𝑗 of adjacent columns
   𝐵
𝐿𝜏𝑗 = {𝐶𝑗 , 𝐶𝑗+1 , . . . , 𝐶𝑗+𝜏 𝐵 }. If every column in the list contains cells with a birth status, the
         𝐵
list 𝐿𝜏𝑗 demonstrates a strict staircase of births.

  We can relax this definition by allowing some of the list’s columns not to contain births (you
can see in the middle bottom part of Figure 4 a couple of such columns that do not annul the
overall behavior -or the visual impression- of a staircase). One potential definition (admittedly,
approximate) of a relaxed staircase pattern is based on simple counting cells with birth status.

Definition 4.5. Approximate 𝜏 𝐵,𝑁 -staircase of births. Assume a 𝜏 𝐵 -sized list of adjacent
           𝐵
columns 𝐿𝜏𝑗 = {𝐶𝑗 , 𝐶𝑗+1 , . . . , 𝐶𝑗+𝜏 𝐵 }. If the columns in the list contains at least 𝜏 𝑁 cells with
                            𝐵
a birth status, the list 𝐿𝜏𝑗 demonstrates an approximate staircase of births.
  Algorithm 1: Generic single-column, count-based pattern extractor algorithm.
   Input: a matrix 𝑃 𝐿𝐷[𝑛 × 𝑚], with 𝑛 entity groups, 𝑚 phases, and 𝑃 𝐿𝐷[𝐸𝑖 , 𝑃𝑗 ] the
          amount of change that took place for entity group 𝐸𝑖 at phase 𝑃𝑗 ; a threshold of
          occurrences that qualifies a column to fulfill a pattern 𝜏
   Output: a set of columns C, each of which demonstrates an occurrence of the tested
            pattern
 1 begin
 2    C=∅
 3    forall 𝑃𝑗 ∈ 𝒫 do
 4        counter = 0
 5        forall 𝐺𝑖 ∈ 𝒢 do
 6            if 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠𝑃 𝑎𝑡𝑡𝑒𝑟𝑛(𝑃 𝐿𝐷[𝐺𝑖 , 𝑃𝑗 ]) then
 7                counter++
 8            end
 9        end
10        if 𝑐𝑜𝑢𝑛𝑡𝑒𝑟 ⋃︀
                      > 𝜏 then
11            C = C 𝑃𝑗
12        end
13    end
14    return C
15 end

16 Interface supportsPattern(cell) : Boolean is an overloaded interface
17     test cell state depending on the pattern searched
18 end




4.3. Algorithm for testing single-column, counting-based patterns
The basic algorithm for the handling of single-column patterns where the qualification of
a column for supporting the pattern is based only on counting, is depicted in Algorithm 1.
For different single-column patterns, different conditions and thresholds can apply. In our
deliberations, we have worked with 𝜏 having a value of 3. Depending on the pattern being
searched, the implementations of the function 𝑠𝑢𝑝𝑝𝑜𝑟𝑡𝑠𝑃 𝑎𝑡𝑡𝑒𝑟𝑛(𝑐𝑒𝑙𝑙) differ. Specifically:

     • For the detection of massive births in a column, the test requires Cell.state == BIRTH
     • For the detection of massive deaths in a column, the test requires Cell.state == DEATH
     • For the detection of massive updates in a column, the test requires Cell.state == ACTIVE
       && 𝑃 𝐿𝐷[𝐺𝑖 , 𝑃𝑗 ] > 0

4.4. Variations and optimizations
There are several variations that one can apply to the simple pattern checking algorithm. A
first, simple modification can be applied when testing rows of the PLD instead of columns. In
this case, instead of searching for phases where a pattern emerged, one can search for entity
groups with interesting behavior. Examples of such tests include:

       • Tests for rows, with a massive/low/zero number of changes (i.e., the ones whose activity
         is beyond/below a certain threshold)
       • Tests for rows with more than one birth, i.e., entities that joined the community, left, and
         re-joined later (e.g., in the case of schema evolution, tables that were removed from the
         schema, only to reappear a few commits later)

A straightforward optimization that we have performed in our implementation is to embed
all single-column checks into the same nested-loops pair. Thus, instead of checking the sup-
portPattern predicate for different patterns in separate loops, we execute all the supportPattern
checks, for all the patterns we want to test, within a single loop, via different counters and
result column-sets, one per pattern. Note also that for the patterns that we check, the matrix
need not be sorted, as the checks are based only on counting cells with the appropriate state.

4.5. Multi-column, shape-based patterns
Now we can define an algorithm for the relaxed version of the birth staircase. For every column
of a sorted PLD, Algorithm 2 checks a window of 𝜏 𝑊 columns for births. To the extent that the
PLD is sorted by birth, if these columns contain births, these births will be placed immediately
after the last birth of the column under investigation. The algorithm is approximate, as, instead
of a shape-based pattern, it checks for the cardinality of the set of cells with births in the window.
If this set exceeds a threshold 𝜏 𝑁 , the window qualifies for a staircase.


5. An application to the study of schema evolution
In this Section, we discuss how our framework is applied to the study of schema evolution.
In particular, we investigate the existence of the aforementioned patterns in the histories of
relational schemata from Free-Open Source projects in a large dataset from the literature.

5.1. Dataset and toolset
For investigating the extent of the presence of patterns, we employ the Schema_Evo_2019 data
set1 from the literature; specifically, from [34]. The data set contains 195 schema histories of
Free-Open Source projects, that were collected from Github with specific collection criteria.
In order to avoid bias, as well as insignificant projects, the author of [34] filtered out of the
corpus the projects with 0 stars or just 1 contributor, DDL files with ’example’, ’demo’, ’test’,
terms in their path, and, projects without a history of versions for the DDL file. We refer the
reader to [34, 35] for a detailed discussion of the collection process, its representativeness, and
its generalization liabilities. The time is measured in commits, the entities are the individual
tables that appear in the schema histories and the measures monitored are: attributes born

1
    Available at Github at https://github.com/DAINTINESS-Group/Schema_Evolution_Datasets/tree/master/
    SchemaEvolutionDatasets2020
   Algorithm 2: Relaxed birth staircase algorithm
    Input: a matrix 𝑃 𝐿𝐷[𝑛 × 𝑚], with 𝑛 entity groups, 𝑚 phases, and 𝑃 𝐿𝐷[𝐺𝑖 , 𝑃𝑗 ] the
           amount of change that took place for entity group 𝐺𝑖 at phase 𝑃𝑗 ; a threshold of
           occurrences that qualifies a column to fulfill a pattern 𝜏 𝑁 ; a column-width
           threshold of the window 𝜏 𝑊 ; a row-height threshold of the window 𝜏 𝐻 ;
    Output: a set of columns L𝑇 , each of which demonstrates an occurrence of the tested
             pattern; a set of cells L𝐶 participating in the pattern
 1 begin
  2    Sort the rows of 𝑃 𝐿𝐷 by birth, ascending
  3    L𝑇 = ∅; L𝐶 = ∅
  4    forall 𝐶𝑗 ∈ 𝑃 𝐿𝐷 do
  5        Let C𝑗 be the set of cells of column 𝐶𝑗 with 𝑠𝑡𝑎𝑡𝑒 == 𝐵𝐼𝑅𝑇 𝐻
  6        Let 𝑅𝑗⋆ be the last row with a cell in a state of birth, at column 𝐶𝑗
           L𝐶𝑗 = ∅; L𝑗 = ∅
                      𝑇                       ◁ L𝐶                  𝑇
  7                                               𝑗 cell-set, L𝑗 column-set, locally
            ◁ Iterate the window of col’s post 𝐶𝑗 , rows post 𝑅𝑗⋆
  8        forall 𝐶𝑘 ∈ {𝐶𝑗+1 , . . . , 𝐶𝑗+𝜏 𝑊 } do
  9            forall 𝑅𝑖 ∈ {𝑅𝑗+1 ⋆ , . . . , 𝑅⋆     } do
                                              𝑗+𝜏 𝐻
10                 forall cells 𝑐 = 𝑃 𝐿𝐷[𝑅𝑖 , 𝐶𝑘 ] do
11                      if 𝑐.𝑠𝑡𝑎𝑡𝑒 == 𝐵𝐼𝑅𝑇 𝐻 then
 12                         add 𝑐 to L𝐶 𝑗
 13                         add 𝐶𝑘 to L𝑇𝑗
14                      end
15                 end
16             end
17         end ⋃︀
18         if |L𝐶
                𝑗    C𝑗 | > 𝜏 𝑁 then
               L𝑇 = L𝑇 L𝑇𝑗
                          ⋃︀
19
               L𝐶 = L𝐶 L𝐶
                           ⋃︀     ⋃︀
20                             𝑗     C𝑗
21         end
22     end
23     return L𝑇 ,L𝐶
24 end




with a new table, attributes injected into an existing table, attributes deleted with a removed
table, attributes ejected from a surviving table, attributes having a changed data type, or a
participation in a changed primary key – all summarized in a measure of total activity (which
is the one employed in the respective PLM’s and PLD’s).
   We have implemented a tool2 that allows the parsing, internal representation and analysis of
community histories, which has been used to study evolving schema histories.

2
    https://github.com/DAINTINESS-Group/PlutarchParallelLives
Table 1
The descriptive statistics for the presence of patterns in the Schema_Evo_2019 data set.




   In Table 1, we depict the descriptive statistics for the Schema_Evo_2019 data set, with
particular emphasis on the median and probability of presence for the discussed patterns. In all
our experiments we have used a quite moderate threshold of 𝜏 = 3. It is quite interesting that
the patterns do not demonstrate a uniform behavior of presence. The Massive Birth pattern
is fairly popular and present in 56% of the studied projects. This can be quite easily explained
by the fact that most databases start with a ’big-bang’ of introducing a significant percentage
of their schema in the 0-th version. On the other hand, several patterns are rather unpopular:
as typically mentioned in the literature, the removal of tables is scarce – let alone the massive
removal (present in just 9% of the projects). The progressive expansion in subsequent steps is
present in just 19% of the studied projects. Somewhere in the middle of the popularity spectrum
is the existence of massive updates: in 21% of the projects, one can observe the presence of
collective, focused maintenance, or expansion, of the schema.


6. Conclusions
In this paper, we have presented a conceptual model that involves entities, timelines, measure-
ments, and their groupings in parallel lives matrices, in order to capture how the different
entities of a community co-evolve. The model allows the visualization and understanding of the
community evolution in a simple, but also powerful, way. The model also allows the mining of
interesting patterns of change that highlight important points and members in the evolution of
the community. We have applied our modeling to the case of schema evolution (which has been
the motivating reason for this research) and derived patterns of change from a large number of
schema histories.
   There are several paths for future research. We have only scratched the surface of the patterns
that can be investigated over Parallel Lives Diagrams. The generalization of birth staircases
and massive updates to a "x changes soon after y" pattern is a simple example. Tool-wise, the
interactive handling of roll-ups and drill-downs in the case of hierarchical structures is also a
possibility. Finally, the fully automated reporting, that requires the ranking and pruning of the
discovered patterns, in terms of their significance is another potential road for future research.
References
 [1] M. M. Lehman, J. C. Fernandez-Ramil, Software Evolution and Feedback: Theory and
     Practice, Wiley, 2006. ISBN-13: 978-0-470-87180-5.
 [2] M. Wermelinger, Y. Yu, A. Lozano, Design principles in architectural evolution: A case
     study, in: 24th IEEE International Conference on Software Maintenance (ICSM 2008),
     Beijing, China, 2008, pp. 396–405.
 [3] Z. Xing, E. Stroulia, Analyzing the evolutionary history of the logical design of object-
     oriented software, IEEE Trans. Software Eng. 31 (2005) 850–868.
 [4] I. Herraiz, D. Rodriguez, G. Robles, J. M. Gonzalez-Barahona, The evolution of the laws of
     software evolution: A discussion based on a systematic literature review, ACM Comput.
     Surv. 46 (2013) 1–28. doi:10.1145/2543581.2543595.
 [5] L. A. Belady, M. M. Lehman, A model of large program development, IBM Systems Journal
     15 (1976) 225–252.
 [6] M. M. Lehman, Programs, life cycles, and laws of software evolution, Proceedings of the
     IEEE 68 (1980) 1060–1076. doi:10.1109/PROC.1980.11805.
 [7] M. M. Lehman, Laws of software evolution revisited, in: Proceedings of 5th European
     Workshop on Software Process Technology, (EWSPT ’96), Nancy, France, October 9-11,
     1996, 1996, pp. 108–124.
 [8] M. M. Lehman, J. F. Ramil, P. Wernick, D. E. Perry, W. M. Turski, Metrics and laws
     of software evolution - the nineties view, in: 4th IEEE International Software Metrics
     Symposium (METRICS 1997), 1997, p. 20.
 [9] M. M. Lehman, J. F. Ramil, D. E. Perry, On evidence supporting the feast hypothesis and
     the laws of software evolution, in: 5th IEEE International Software Metrics Symposium
     (METRICS 1998), Bethesda, Maryland, USA, 1998, pp. 84–88.
[10] M. J. Lawrence, An examination of evolution dynamics, in: Proceedings, 6th International
     Conference on Software Engineering (ICSE 1982), Tokyo, Japan, 1982, pp. 188–196.
[11] S. S. Pirzada, A Statistical Examination of the Evolution of the Unix System, Ph.D. thesis,
     Imperial College, University of London, 1988.
[12] N. T. Siebel, S. Cook, M. Satpathy, D. Rodríguez, Latitudinal and longitudinal process
     diversity, Journal of Software Maintenance 15 (2003).
[13] M. W. Godfrey, Q. Tu, Evolution in open source software: A case study, in: Proceedings of
     the International Conference on Software Maintenance, 2000, pp. 131–142.
[14] M. W. Godfrey, Q. Tu, Growth, evolution, and structural change in open source software,
     in: Proceedings of the 4th International Workshop on Principles of Software Evolution,
     IWPSE ’01, 2001, pp. 103–106.
[15] G. Robles, J. J. Amor, J. M. Gonzalez-Barahona, I. Herraiz, Evolution and growth in large
     libre software projects, in: Proceedings of the Eighth International Workshop on Principles
     of Software Evolution, IWPSE ’05, 2005, pp. 165–174.
[16] S. Koch, Software evolution in open source projects: a large-scale investigation, J. Softw.
     Maint. Evol. 19 (2007) 361–382. doi:10.1002/smr.v19:6.
[17] G. Xie, J. Chen, I. Neamtiu, Towards a better understanding of software evolution: An
     empirical study on open source software, in: 25th IEEE International Conference on
     Software Maintenance (ICSM 2009), Edmonton, Alberta, Canada, 2009, pp. 51–60.
[18] I. Herraiz, G. Robles, J. M. Gonzalez-Barahon, Comparison between slocs and number of
      files as size metrics for software evolution analysis, in: Proceedings of the Conference on
      Software Maintenance and Reengineering, CSMR ’06, IEEE Computer Society, Washington,
      DC, USA, 2006, pp. 206–213. URL: http://dl.acm.org/citation.cfm?id=1116163.1116405.
[19] R. Vasa, Growth and Change Dynamics in Open Source Software Systems, Ph.D. thesis,
      Swinburn Univ. of Technology, Australia, 2010.
[20] A. Israeli, D. G. Feitelson, The linux kernel as a case study in software evolution, J. Syst.
      Softw. 83 (2010) 485–501. doi:10.1016/j.jss.2009.09.042.
[21] D. Sjøberg, Quantifying schema evolution, Information and Software Technology 35 (1993)
      35–44.
[22] C. Curino, H. J. Moon, L. Tanca, C. Zaniolo, Schema evolution in wikipedia: toward a web
      information system benchmark, in: Proceedings of ICEIS 2008, 2008.
[23] D.-Y. Lin, I. Neamtiu, Collateral evolution of applications and databases, in: Joint Intl.
     Annual ERCIM Workshops on Principles of Software Evolution (IWPSE) and Software
      Evolution (Evol), 2009, pp. 31–40.
[24] S. Wu, I. Neamtiu, Schema evolution analysis for embedded databases, in: 2011 IEEE 27th
      International Conference on Data Engineering Workshops, ICDEW ’11, 2011, pp. 151–156.
[25] D. Qiu, B. Li, Z. Su, An empirical analysis of the co-evolution of schema and code in
      database applications, in: 2013 9th Joint Meeting on Foundations of Software Engineering,
     (ESEC/FSE), 2013, pp. 125–135.
[26] A. Cleve, M. Gobert, L. Meurice, J. Maes, J. H. Weber, Understanding database schema
      evolution: A case study, Sci. Comput. Program. 97 (2015) 113–121.
[27] I. Skoulis, P. Vassiliadis, A. V. Zarras, Growing up with stability: How open-source
      relational databases evolve, Information Systems 53 (2015) 363–385.
[28] P. Vassiliadis, A. V. Zarras, I. Skoulis, Gravitating to rigidity: Patterns of schema evolution
     - and its absence - in the lives of tables, Information Systems 63 (2017) 24–46.
[29] P. Vassiliadis, A. V. Zarras, Schema evolution survival guide for tables: Avoid rigid
      childhood and you’re en route to a quiet life, Journal of Data Semantics 6 (2017) 221–241.
[30] J. Delplanque, A. Etien, N. Anquetil, O. Auverlot, Relational database schema evolution:
     An industrial case study, in: 2018 IEEE International Conference on Software Maintenance
      and Evolution, ICSME 2018, Madrid, Spain, September 23-29, 2018, IEEE Computer Society,
      2018, pp. 635–644.
[31] P. Vassiliadis, M. Kolozoff, M. Zerva, A. V. Zarras, Schema evolution and foreign keys:
      a study on usage, heartbeat of change and relationship of foreign keys to table activity,
      Computing 101 (2019) 1431–1456.
[32] K. Dimolikas, A. V. Zarras, P. Vassiliadis, A study on the effect of a table’s involvement in
      foreign keys to its schema evolution, in: 39th International Conference on Conceptual
      Modeling, ER 2020, Vienna, Austria, November 3-6, 2020, publisher = Springer, series =
      Lecture Notes in Computer Science, volume = 12400, pages = 456–470„ 2020.
[33] D. Braininger, W. Mauerer, S. Scherzinger, Replicability and reproducibility of a schema
      evolution study in embedded databases, in: ER 2020 Workshops, Vienna, Austria, November
      3-6, 2020, volume 12584 of Lecture Notes in Computer Science, Springer, 2020, pp. 210–219.
[34] P. Vassiliadis, Profiles of schema evolution in free open source software projects, in: 37th
      IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April
     19-22, 2021, IEEE, 2021, pp. 1–12.
[35] P. Vassiliadis, G. Kalampokis, Taxa and super taxa of schema evolution and their
     relationship to activity, heartbeat and duration, Inf. Syst. 110 (2022) 102109. URL:
     https://doi.org/10.1016/j.is.2022.102109. doi:10.1016/j.is.2022.102109.
[36] M. Klettke, H. Awolin, U. Störl, D. Müller, S. Scherzinger, Uncovering the evolution history
     of data lakes, in: IEEE International Conference on Big Data, BigData 2017, Boston,A,
     USA, December 11-14, 2017, IEEE Computer Society, 2017, pp. 2462–2471.
[37] S. Scherzinger, S. Sidortschuck, An empirical study on the design and evolution of nosql
     database schemas, in: 39th International Conference on Conceptual Modeling, ER 2020,
     Vienna, Austria, November 3-6, 2020, volume 12400 of Lecture Notes in Computer Science,
     Springer, 2020, pp. 441–455.
[38] U. Störl, M. Klettke, S. Scherzinger, Nosql schema evolution and data migration: State-
     of-the-art and opportunities, in: Proceedings of the 23rd International Conference on
     Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02,
     2020, OpenProceedings.org, 2020, pp. 655–658.
[39] T. A. Limoncelli, SQL is no excuse to avoid devops, Commun. ACM 62 (2019) 46–49. URL:
     https://doi.org/10.1145/3287299. doi:10.1145/3287299.
[40] M. Stonebraker, R. C. Fernandez, D. Deng, M. L. Brodie, Database decay and what to do
     about it, Commun. ACM 60 (2017) 11. doi:10.1145/3014349.
[41] A. Maule, W. Emmerich, D. S. Rosenblum, Impact analysis of database schema changes,
     in: 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany,
     May 10-18, 2008, ACM, 2008, pp. 451–460.
[42] S. K. Gardikiotis, N. Malevris, A two-folded impact analysis of schema changes on database
     applications, Int. J. Autom. Comput. 6 (2009) 109–123.
[43] G. Papastefanatos, P. Vassiliadis, A. Simitsis, Y. Vassiliou, Hecataeus: Regulating schema
     evolution, in: ICDE, 2010, pp. 1181–1184.
[44] M. Hartung, J. F. Terwilliger, E. Rahm, Recent advances in schema and ontology evolution,
     in: Z. Bellahsene, A. Bonifati, E. Rahm (Eds.), Schema Matching and Mapping, Data-Centric
     Systems and Applications, Springer, 2011, pp. 149–190.
[45] P. Manousis, P. Vassiliadis, A. V. Zarras, G. Papastefanatos, Schema evolution for databases
     and data warehouses, in: 5th European Summer School on Business Intelligence , eBISS
     2015, volume 253 of Lecture Notes in Business Information Processing, Springer, 2015, pp.
     1–31.
[46] L. Caruccio, G. Polese, G. Tortora, Synchronization of queries and views upon schema
     evolutions: A survey, ACM Trans. Database Syst. 41 (2016) 9:1–9:41.
[47] M. Goeminne, A. Decan, T. Mens, Co-evolving code-related and database-related changes
     in a data-intensive software system, in: IEEE Conference on Software Maintenance,
     Reengineering, and Reverse Engineering, CSMR-WCRE 2014, Antwerp, Belgium, 2014, pp.
     353–357.
[48] S. Scherzinger, W. Mauerer, H. Kondylakis, Debinelle: Semantic patches for coupled
     database-application evolution, in: 37th IEEE International Conference on Data Engineer-
     ing, ICDE 2021, Chania, Greece, April 19-22, 2021, IEEE, 2021, pp. 2697–2700.
[49] P. Vassiliadis, F. Shehaj, G. Kalampokis, A. V. Zarras, Joint source and schema evolution:
     Insights from a study of 195 FOSS projects, in: Proceedings 26th International Conference
     on Extending Database Technology, EDBT 2023, Ioannina, Greece, March 28-31, 2023, 2023,
     pp. 27–39. URL: https://doi.org/10.48786/edbt.2023.03. doi:10.48786/edbt.2023.03.
[50] S. A. Bohner, D. Gracanin, T. Henry, K. Matkovic, Evolutional insights from UML and
     source code versions using information visualization and visual analysis, in: Proceedings
     of the 4th IEEE International Workshop on Visualizing Software for Understanding and
     Analysis, VISSOFT 2007, Banff, Alberta, Canada, June 25-26, 2007, 2007, pp. 145–148.
     doi:10.1109/VISSOF.2007.4290713.
[51] M. Krstajic, E. Bertini, D. A. Keim, Cloudlines: Compact display of event episodes in
     multiple time-series, IEEE Trans. Vis. Comput. Graph. 17 (2011) 2432–2439. URL: https:
     //doi.org/10.1109/TVCG.2011.179. doi:10.1109/TVCG.2011.179.
[52] A. Kuhn, M. Stocker, Codetimeline: Storytelling with versioning data, in: 34th International
     Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland, 2012,
     pp. 1333–1336. doi:10.1109/ICSE.2012.6227086.
[53] O. Benomar, H. A. Sahraoui, P. Poulin, Visualizing software dynamicities with heat maps,
     in: 2013 First IEEE Working Conference on Software Visualization (VISSOFT), Eindhoven,
     The Netherlands, 2013, pp. 1–10. doi:10.1109/VISSOFT.2013.6650524.
[54] S. Rufiange, G. Melançon, Animatrix: A matrix-based visualization of software evolution,
     in: Second IEEE Working Conference on Software Visualization, VISSOFT 2014, Victoria,
     BC, Canada, September 29-30, 2014, 2014, pp. 137–146. doi:10.1109/VISSOFT.2014.30.
[55] B. Ens, D. J. Rea, R. Shpaner, H. Hemmati, J. E. Young, P. Irani, Chronotwigger: A visual
     analytics tool for understanding source and test co-evolution, in: Second IEEE Working
     Conference on Software Visualization, VISSOFT 2014, Victoria, BC, Canada, September
     29-30, 2014, 2014, pp. 117–126. doi:10.1109/VISSOFT.2014.28.
[56] M. D. Feist, E. A. Santos, I. Watts, A. Hindle, Visualizing project evolution through
     abstract syntax tree analysis, in: 2016 IEEE Working Conference on Software Vi-
     sualization, VISSOFT 2016, Raleigh, NC, USA, October 3-4, 2016, 2016, pp. 11–20.
     doi:10.1109/VISSOFT.2016.6.
[57] C. Mesnage, M. Lanza, White coats: Web-visualization of evolving software in 3d, in:
     S. Ducasse, M. Lanza, A. Marcus, J. I. Maletic, M. D. Storey (Eds.), Proceedings of the 3rd
     International Workshop on Visualizing Software for Understanding and Analysis, VISSOFT
     2005, Budapest, Hungary, September 25, 2005, IEEE Computer Society, 2005, pp. 40–45. URL:
     https://doi.org/10.1109/VISSOF.2005.1684302. doi:10.1109/VISSOF.2005.1684302.
[58] L. Meurice, A. Cleve, DAHLIA 2.0: A visual analyzer of database usage in dynamic and
     heterogeneous systems, in: 2016 IEEE Working Conference on Software Visualization, VIS-
     SOFT 2016, Raleigh, NC, USA, October 3-4, 2016, 2016, pp. 76–80. doi:10.1109/VISSOFT.
     2016.15.
[59] T. Schneider, Y. Tymchuk, R. Salgado, A. Bergel, Cuboidmatrix: Exploring dynamic
     structural connections in software components using space-time cube, in: 2016 IEEE
     Working Conference on Software Visualization, VISSOFT 2016, Raleigh, NC, USA, October
     3-4, 2016, 2016, pp. 116–125. doi:10.1109/VISSOFT.2016.17.
[60] F. Pfahler, R. Minelli, C. Nagy, M. Lanza, Visualizing evolving software cities, in: Working
     Conference on Software Visualization, VISSOFT 2020, Adelaide, Australia, September 28 -
     October 2, 2020, IEEE, 2020, pp. 22–26. URL: https://doi.org/10.1109/VISSOFT51673.2020.
     00007. doi:10.1109/VISSOFT51673.2020.00007.