1. Introduction

Novel XBWT-based Distance Measures for Labeled Trees

Danilo G. Dolce

Sabrina Mantaci

Giuseppe Romana

Giovanna Rosone

Marinella Sciortino

0 0 University of Palermo , Italy 1 University of Pisa , Italy

Comparing labeled trees has applications in various domains, particularly in the study of cancer phylogenies. In this paper, we address the problem of comparing fully labeled unordered trees, focusing on their structural and label similarities. We define a novel class of distance measures by exploiting the eXtended Burrows-Wheeler Transform (XBWT), an extension to labeled trees of the well-known Burrows-Wheeler Transform. The XBWT, introduced in [Ferragina et al., FOCS 2005], produces a linearization of the tree that is both compressible and eficiently searchable. We extend the definition of this linearization to pairs of trees, and we produce a partition based on the prefixes of a given length ≥ 1 of the parent-to-root paths of the nodes. We define, for any ≥ 1, the distance measure by applying the Jaccard distance to each element of this partition. We prove that all the measures are pseudometrics, i.e. they assume non-negative values, are zero when applied to identical trees, are symmetric, and satisfy the triangle inequality. These measures become metrics when all the labels within each tree are distinct. Furthermore, we show that these distances are sensitive to some operations on trees, such as the removal and insertion of subtrees, swapping of subtrees, and label swapping. Finally, to show the efectiveness of our approach, we have analyzed experimentally the behavior of the distances when operations on trees are applied to a randomly generated fully labeled tree. Here, we present the results obtained in the case = 1.

eol>Burrows-Wheeler Transform XBWT labeled tree distance measure

1. Introduction

Trees are fundamental and well-studied combinatorial structures in computer science since their structure can encode, in a natural way, hierarchical relations in many domains.

Comparing trees is a key problem that arises in various fields like computational biology, structured text databases, image analysis, automated theorem proving, and compiler optimization, where it can be crucial to have efective distance measures able to capture diferent types of operations or transformations on the trees used to model data. A survey on the methods for comparing labeled trees based on simple local operations of deleting, inserting, and relabeling nodes can be found in [ 1 ]. In Bioinformatics, the trees are also used to model cancer phylogenies. In this context comparing trees typically means analyzing and comparing the genetic mutations and progression of cancer cells over time. In fact, according to the clonal theory of cancer [ 2 ], each node in these trees is labeled to represent a distinct genetic mutation or a group of mutations, and the edges represent the evolutionary pathways that these mutations have taken. The comparison between these trees aims to understand the evolutionary history of the cancer, identify common pathways of mutation, and potentially uncover patterns that could lead to better treatment strategies [ 3, 4, 5 ]. Several distance measures have been recently introduced with the aim to compare the topology of the trees and the mutations involved (see [ 6, 7, 8, 9, 10 ] and references therein). Other recent approaches are based on metaheuristics [ 11 ].

Motivated by the application in the study of cancer phylogeny, in this paper we focus our attention on the comparison between fully labeled trees, i.e. assuming that each node, whether internal or a leaf, has a label over a finite alphabet. However, the methodology we present in this paper can also be extended to the case of multi-labeled trees.

Here we address the problem of comparing unordered labeled trees by exploiting a linearization of the trees produced by the eXtended Burrows-Wheeler Transform (XBWT) [ 12, 13 ]. The XBWT is an elegant transformation that extends to trees the functionalities of the well-known Burrows-Wheeler Transform (denoted as BWT [ 14 ]), which is instead defined on strings and has been introduced in the context of data compression. The XBWT is applied to a labeled tree and emits, in addition to a permutation of the labels of the tree (tree linearization), a sequence of bits that makes the transformation reversible. This output is compressible and eficiently searchable. This is particularly useful for applications in Bioinformatics and XML document processing.

In this paper, we are not interested in the aspects related to compression and indexing. For this reason, here we will not use the full output of the XBWT, but only the tree linearization it produces. More in detail, we extend to a pair of trees the linearization computed by XBWT with the aim of measuring how the labels of the nodes of the two trees are mixed in the output of this transformation. We therefore define a novel class of distances between two trees through a partition of the linearization induced by the lexicographically sorted prefixes of length ≥ 1 of the parent-to-root paths of the nodes of the tree, eventually extended to length with an appropriate special character. To define these measures, the Jaccard distance is used in each element of such partition. We prove that such measures are pseudometrics. Note that the measures become metrics when all the labels within each tree are distinct.

Furthermore, we study the sensitivity of the measures with respect to some operations on trees, such as the insertion or deletion of a subtree, subtree swapping, and the exchange of labels between two nodes.

Finally, to show the robustness and efectiveness of our method, we have also analyzed the behavior of the measures on simulated datasets obtained by applying a sequence of operations on randomly generated trees. Here we present the results obtained for the case = 1 and when trees with distinct labels are considered.

In this paper, we do not focus on implementation aspects, but rather on methodological issues. The problems related to space and time complexity will be addressed in a subsequent full version of the paper. To our knowledge, this approach is innovative compared to the measures used in the literature for labeled tree comparison. The idea of comparing two combinatorial structures by measuring how they mix within a common structure has been used in the context of string comparison through the use of an extension to a collection of sequences of the Burrows-Wheeler Transform [ 15 ] but with diferent output partitioning strategies [ 16, 17 ]. Such an extension has been largely applied for comparing biological sequences (see [18] and references therein). Moreover, it has been recently used in [19] to reconstruct the phylogenetic tree of a collection of biological sequences. We believe that the methodology introduced in this paper can also provide new insights in the context of string comparison.

2. Preliminaries

Let Σ = {1, 2, . . . , } be a finite ordered alphabet with 1 < 2 < . . . < , where < denotes the standard lexicographic order. We denote by Σ* the set of words over Σ. Given a finite word ∈ Σ* , let be the length of , denoted ||. We also denote by [] the -th letter in for any 1 ≤ ≤ , therefore = [ 1 ][ 2 ] . . . [], and we denote by [, ] the substring [][ + 1] · · · []. A prefix is a substring of the form [1, ] for some , and a sufix one of the form [, ||]. Given two strings and , we denote by (, ) the length of the longest common prefix (LCP) of and , i.e., (, ) = max{ | [1, ] = [1, ]}.

A rooted tree = (, ), with set of nodes and ⊆ × set of edges, is a directed connected acyclic graph where: 1. all the nodes have one in-edge, except the root that has no in-edges; 2. all nodes have zero (leaves) or more (internal nodes) out-edges. The size of a tree , denoted by | |, is equal to the number | | of its nodes. Given a tree , we denote by ( ) the set of its leaves. If there is an edge from node to node , then is the parent of and is the child of . A path in the tree is a sequence of nodes 1, 2, . . . where is parent of − 1 for all 1 ≤ ≤ − 1. If < then is ancestor of , and is descendent of . The depth of a node is the length of the path from its parent to the root. The depth of the root is 0. For a given > 0, we denote by ≤ the subtree of from the root up to the nodes at depth . Two nodes that are children of the same node are called siblings. The tree is an ordered tree if a left-to-right order among siblings in is given, otherwise it is unordered. A tree = (, ) is labeled over the alphabet Σ if it is defined a labeling function ℓ : → Σ that associates a letter from Σ to each node of . For each node x of a labeled tree, we denote by (x) the string obtained as the concatenation of the labels in the path from the parent node of x to the root of the tree. Let us denote by ( ) the multiset of all the strings (x) for every node x ∈ . If is an ordered tree, is populated with the parent-to-root string paths of the tree nodes visited in pre-order.

The eXtended Burrows-Wheeler Transform (XBWT) of an ordered labeled tree , denoted by xbw( ), linearizes the tree with a string, denoted as ( ), obtained as a concatenation of the labels of all nodes, ordered according to the lexicographical sorting of their parent-to-root string paths in ( ).

In this paper, we will consider labeled trees where either all the nodes have diferent labels, or nodes with equal labels are allowed, but only if they are not siblings and appear in diferent root-to-leaf paths. Moreover, for a technical reason, we add a child to each leaf, labeled with a special symbol $ ̸∈ Σ. We exclude these nodes to count the size | |. Observe that for each tree extended in this way, the set ( ) consists solely of nodes labeled with $. We denote by Σ the set of all such trees.

Let and be two sets. The Jaccard distance between and is defined from the Jaccard coeficient of similarity for and , as follows: (, ) = | ∩ | , (, ) = 1 − (, ) = | ∪ | − | ∩ | .

| ∪ | | ∪ | We further assume that whenever = = ∅, (, ) = 1, and (, ) = 0. Observe that the measure is a metric. Obviously, 0 ≤ (, ) ≤ 1 and (, ) = 0 if = , and (, ) = 1 if ∩ = ∅ and ∪ ̸= ∅.

3. Linearization of Pairs of Trees via XBWT

In this section, we aim to define a linearization for pairs of ordered trees using an XBWT-based approach. Specifically, we define xbw(1, 2) as the string over the alphabet Σ × { 1, 2} defined in the following. Note that such a linearization can be defined for every pair of trees. However, motivated by practical applications, we assume here that each tree contains distinct labels or that it may contain repeated labels but only in diferent root-to-leaf paths and not for nodes with the same parent. Let us denote by (1, 2) the multiset of all the strings (x) for every node x in the trees 1 and 2. Let us denote by (1, 2) the string obtained by concatenating the labels of all the nodes of 1 and 2 ordered according to the lexicographical sorting of the paths in (1, 2).

The output xbw(1, 2) is a sequence of pairs (ℓ(x), ), where ℓ(x) ∈ and is the flag assuming value 1 or 2 if x comes from 1 or 2, respectively. However, for simplicity of exposition, in the figures and tables, we replace the flags with the full names of the trees considered.

We enrich the definition of XBWT of the two trees 1 and 2 with the LCP array, defined as [ 1 ] = 0 and [] = ( [], [ − 1]) for any 1 < < |1|+|2|+|(1)|+|(2)|. Example 1. Let 1 and 2 be the pair of trees depicted in Fig. 1a and in Fig. 1b, respectively. The output of xbw(1, 2) is represented in the table showed in Fig. 1c. We remark that, by construction, the first two lines in the table contain the roots of 1 and 2, respectively. The last column in the table contains the values of the LCP array.

Note that the parent-to-root string paths of two sibling nodes are equal. If the trees are ordered, the relative order of such paths is determined by the left-to-right order of the nodes. However, the distance measures we will introduce in the next section consider the labels of such nodes as a set, thus their relative order is not relevant. Then, they could be applied to unordered trees as well.

4. A new Class of Distance Measures between Trees

In this section, we introduce a new measure for comparing two unordered trees, 1 and 2, and we will prove that this measure is a pseudometric. To compute this measure, for a given D H $ G B $

G $ H $ (a) Tree 1 (b) Tree 2

F A $ C F $

B $ P $ (c) xbw(1, 2) and LCP array consists of the two columns and . The LCP array is contained in the last column of the table. Σ# = ⋃︀ the context.

′∈[0,] > 0 we map each word in of length < into a new word over the right-padded alphabet {︁Σ′ · { #}− ′ }︁. To improve readability, we omit whenever it is clear from In defining the measure, we refer to a partition of (1, 2) according to the values of the LCP array, defined as follows. |(1)| + |(2)|] obtained as union of the following intervals: Definition 1. LCP-based partition of order . Given a positive integer , let us consider the list of strings

()(1, 2) obtained by extending all the strings in (1, 2) to the length with the # fill character, where # is smaller than all the characters in Σ ∪ {$}. Let us denote by the LCP array for

()(1, 2). We denote by the partition of the interval [1, |1| + |2| + • ⋃︀ 1≤ ≤ < {[, ] | [] < , [ + 1] < , [] ≥ , ∀ ∈ [ + 1, ]}; ∑︁ [,]∈1 ∑︁ [,]∈2 • [, |1| + |2| + |(1)| + |(2)|] | [] < , [] ≥ |2| + |(1)| + |(2)|]. , ∀ ∈ [ + 1, |1| +

Note that each interval [, ] in the partition can be uniquely associated with a word in Σ, which is a -length prefix of the longest common prefix of the strings, possibly extended to length with the padding character #, in the list (1, 2) contained in the interval [, ].

We denote by ([, ]) such a word, and the indexes and are denoted by ( [, ]) and ( [, ]), respectively. For each pair of trees 1 and 2, we denote by Λ(1, 2) = { [, ] | [, ] ∈ }, that is the set of -length words from Σ# that appear as prefix of some word in ()(1, 2). Moreover, for ∈ {1, 2}, we denote by ([, ]) = { ∈ Σ ∪ {$} | (, ) ∈ xbw(1, 2)[, ]}, that is the set of labels in [, ] coming from the tree . Analogously, for each tree and word ∈ Σ#, we define the set Γ()() = { ∈ Σ ∪ {$} | · ∈ ()( )}, that is the set of letters in xbw( )[, ] such that [, ] = .

Definition 2.

The distance measure is defined as:

∑︁ [,]∈

Example 2. Let us consider the two trees 1 and 2 depicted in Figure 2. The partition of (1, 2) induced by (with = 1 and = 2) is shown in Table 1. By applying the Jaccard distance to each element of the partition,

1. The statement follows from the definition. 2. From the definition, we have that ( , ) = 0 since (1[, ], 2[, ]) = 0 for all [, ] ∈ .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 1 2 1 2 1 2 1 2 1 2 2 1 1 2

B C X $ from the tree on the left 1 (a) by swapping the subtrees rooted in and , respectively.

Index Flag

1 2 0 0 0 1 1 1 0 2 0 2 1 3 0 2 0 2 (1) # # A A A A BA BA CA CA CBA CBA XCA

XCBA

YCA

YCBA The table shows how the partition relative to the two trees depicted in Fig. 2 can vary as changes. Specifically, the partitions 1 and 2 are illustrated. In the columns

the letters of the prefixes which define the partitions 1 and 2 respectively. Partition 2 contains the same intervals as 1, except for two intervals obtained as a refinement of the interval [ 9, 12 ] from 1. This refinement is indicated with a dashed line. The table also shows the output of XBWT applied (1) and (2) we highlight in bold to the two trees and LCP arrays.

3. The statement follows from the fact that the Jaccard distance is a metric and then the symmetric property holds. 4. Since the Jaccard distance is a metric and the triangle inequality holds, for each ∈ Σ holds that one can always find a ′ > such that ′ (12) > 0, i.e. ′ makes 1 and 2 distinguishable. This fact is shown in Fig. 2, where the trees (a) and (b) are indistinguishable for 1, i.e. their 1 distance is 0, but they become distinguishable when 2 distance is applied.

By using a similar argument as in the proof of Proposition 1, it is possible to prove the following corollary. and only if 1 = 2. proposition.

Corollary 1. Let be the set of all unordered trees whose nodes are labeled by distinct symbols. Then is a metric over for each ≥ 1, i.e. for each pair of trees 1, 2 ∈ , (1, 2) = 0 if

What we have stated in the previous remark can be more generally formalized in the following 1 + ∑︀∈[1,] |Σ|.

Proposition 2. Given two trees 1 and 2 in Σ, (1, 2) ≤ +1(1, 2), for each ≥ 1.

Note that if we consider the set Σ of all labeled trees, the measures can be used to define a class of dissimilarity measures normalized with respect to the number of words in Σ#, i.e.

5. Sensitivity to Operations on Trees

In this section, we evaluate how the distance changes when a swap of subtrees, label exchanges, insertion, or removal of nodes are applied to a tree. Other operations will be considered in the full version of the paper.

Proposition 3. Let 1 and 2 be two unordered trees such that 2 is obtained from 1, by swapping two disjoint subtrees v1 and v2 , rooted in the nodes v1 and v2, respectively. Then, 1(1, 2) ≤ 2, and (1, 2) ≤ 2(| v≤1− 2| + | v≤2− 2|) + 2 for all > 1, where | v≤1− 1| and | v≤2− 1| denote the number of nodes at depth at most − 1 in the subtrees v1 and v2 , respectively.

Proof. Let v1 and v2 be the two nodes in 1 that are the roots of the subtrees swapped to obtain 2, and let be the LCP-based partition of order for 1 and 2. Let us denote by (v1) and (v2) the parent-to–root string path of v1 and v2, respectively, both possibly padded up to length by using the character #. The upper bound for the distance (1, 2) is obtained when all the labels in 1 are distinct (and therefore in 2 as well) and if each of the two nodes is the only child of their respective parent. In this case, for all > 0, ((1)[ ( (v1)), ( (v1))], (2)[ ( (v1)), ( (v1))]) = 1 and ((1)[ ( (v2)), ( (v2))], (2)[ ( (v2)), ( (v2))]) = 1. Moreover, for all > 1, each node z in the subtrees v1 and v2 at depth at most − 2 from the root increases by 2 the distance . All the other intervals in the partition give a zero contribution to the distance. Then, the thesis follows.

The following proposition provides an evaluation of the distance between two trees when one is obtained from the other by removing or inserting an entire subtree.

Proposition 4. Let 1 and 2 be two unordered trees such that 2 is obtained by removing from 1 a subtree x ̸= 1 with root x. Then, (1, 2) ≤ | x| + 1.

Proof. Let x be the subtree rooted in x removed from 1 to obtain 2. Let be the LCP-based partition of order for 1 and 2. Let us denote by (x) the parent-to–root string path of x possibly padded up to length by using the character #.

The upper bound for the distance (1, 2) is obtained when x has no sibling nodes and (x) does not have any common -length prefix with other parent-to-root string paths. In this case, the parent of x and each node of x, leaves excluded, increases by 1 the distance .

The following proposition considers the operation of swapping the labels of two nodes. Note that the label swap does not involve any descendants of the nodes we are considering. Proposition 5. Let 1 and 2 be two unordered trees such that 2 is obtained from 1, by swapping the label of the nodes v1 and v2. Let us denote by v1 and v2 the subtrees rooted in the nodes v1 and v2, respectively. Then, 1(1, 2) ≤ 4, and (1, 2) ≤ 2(| v≤1− 1| + | v≤2− 1|) + 2 for all > 1, where | v≤1− 1| and | v≤2− 1| denote the number of nodes at depth at most − 1 in the subtrees v1 and v2 , respectively.

Proof. Let v1 and v2 be the two nodes in 1 whose labels are swapped in 1 to obtain 2. This means that in 2 we can find the subtrees v′ 1 and v′ 2 obtained by swapping the roots of the subtrees v1 and v2 in 1 respectively. Let be the LCP-based partition of order for 1 and 2. Let us denote by (v1) and (v2) the parent–to–root string path of v1 and v2, respectively, both possibly padded up to length by using the character #. In order to obtain the upper bound for the distance (1, 2) we assume that all the labels in 1 are distinct (and therefore in 2 as rooted in and , removing the subtree rooted in , and swapping the labels and . zero contribution to the distance. Then, the thesis follows. well), each of the two nodes is the only child of their respective parent and the subtrees v1 and v2 have distinct labels. In this case, ((1)[ ( (v1)), ( (v1))], (2)[ ( (v1)), ( (v1))]) = 1, and symmetrically ((1)[ ( (v2)), ( (v2))], (2)[ ( (v2)), ( (v2))]) = 1. For = 1, observe that ((1)[ (ℓ(v1)), (ℓ(v1))], (2)[ (ℓ(v1)), (ℓ(v1))]) = 1, and equivalently ((1)[ (ℓ(v2)), (ℓ(v2))], (2)[ (ℓ(v2)), (ℓ(v2))]) = 1. On the other hand, for all > 1, each node z in the subtrees v1 and v2 (equivalently in v′ 1 and v′ 2 ) at depth at most − 2 from the root increases by 2 the distance . All the other intervals in the partition give a Flag The tables show the phases of computation of 2(1, 2) = 6, 2(1, 3) = 4, 2(1, 4) = 12, where 1, 2, 3, 4 are depicted in Figure 3.

In Fig. 3 and Table 2 is displayed a worst–case example for each of the cases described in Propositions 3, 4, and 5, showing that the three upper–bounds are tight.

1 1 2 3 3 1 4 3 5 1 6 3 7 1 8 3 9 1 10 3 11 1 12 1 13 1 14 3 15 1 16 1 17 1 18 3

A A B A B A C A C A X BA $ BA Y CA Y CA $ DXBA $ EXBA $ FYCA $ FYCA D XBA E XBA F YCA F YCA

Num4ber of subtre6es removed8 (a) Subtree removals 10 2

Num4ber of sub6trees swap8ped (b) Subtree swap 10 12 10 d1 8 e c n ta 6 s i D 4 2 0 10 Number of subtrees swapped 40 20 30

50 (c) Subtree swap with repetitions

We have analyzed the behavior of the measures on simulated data, by evaluating the distance values after the application of perturbations, which consist of subtree removal and subtree swapping operations, on randomly generated trees. We conclude this section by showing in Fig. 4a the results obtained by considering the values of the 1 measure after applying a maximum of 10 subtree removals, chosen randomly, on randomly generated trees having distinct labels and such that each internal node has at most three children. We also show the behavior of the 1 measure when the operation applied on a randomly generated tree is the subtree swapping. In Fig. 4b, we consider the case where each subtree can be swapped at most once, and in Fig. 4c, successive swapping operations on the same subtree are allowed.

A more extensive description of the experiments will be presented in a subsequent extended version of the paper.

6. Conclusions and Further Work

In this paper, we introduced a new class of distances for unordered fully labeled trees. Theoretical results, as well as preliminary experimental analyses on simulated data, show that these measures can efectively capture some operations on trees such as removal and insertion of subtrees, subtree swapping, and label swapping. These distances are defined using an LCP-based partition of a linearization of trees, defined by a generalization of the XBWT. We have proven that the measures in this class are pseudometrics and become metrics when trees having distinct labels are considered. In the general case in which repeated labels are allowed, we have observed that for any given collection of trees, there exists an integer for which is a metric over the dataset. It would be interesting to experimentally determine for any given dataset of trees the smallest value of for which is a metric.

We have focused on combinatorial aspects related to the extension of XBWT to compare pairs of trees. The algorithmic issues related to the eficiency of computing these measures, as well as the use of this transformation for finding common subtrees, will be explored in the full paper.

Our preliminary experimental evaluation shows that our method is able to capture structural diferences and similarities between unordered trees, with significant possible implications for computational biology, XML data processing, and hierarchical clustering. We intend to evaluate the behavior of the measures concerning a more comprehensive set of tree operations, as well as to test these measures on real datasets for the study of cancer phylogenies. To this end, we plan to extend the methodology introduced in this paper to the more general case of multi-labeled trees, by using the Jaccard distance defined on multisets, and compare our approach with others existing in the literature.

Acknowledgments

Sabrina Mantaci is partially supported by INdAM-GNCS Project 2024- CUP E53C23001670001 (“Proprietà combinatorie e distanze basate su parole da evitare”). Giuseppe Romana, Giovanna Rosone, and Marinella Sciortino are partially supported by MUR PRIN project no. 2022YRB97K - “PINC” (Pangenome INformatiCs: From Theory to Applications), and partially funded by the INdAM-GNCS Project CUP E53C23001670001 (“Compressione, indicizzazione, analisi e confronto di dati biologici”). Giovanna Rosone is partially supported by PAN-HUB T4-AN-07 (“Hub multidisciplinare e interregionale di ricerca e sperimentazione clinica per il contrasto alle pandemie e all’antibiotico resistenza”), CUP I53C22001300001. Sabrina Mantaci and Marinella Sciortino are partially supported by the project “ACoMPA – Algorithmic and Combinatorial Methods for Pangenome Analysis” (CUP B73C24001050001) funded by the NextGeneration EU programme PNRR ECS00000017 Tuscany Health Ecosystem (Spoke 6). [18] V. Guerrini, F. A. Louza, G. Rosone, Metagenomic analysis through the extended Burrows-Wheeler transform, BMC Bioinform. 21-S (2020) 299. doi:10.1186/ S12859-020-03628-W. [19] V. Guerrini, A. Conte, R. Grossi, G. Liti, G. Rosone, L. Tattini, phyBWT2: phylogeny reconstruction via eBWT positional clustering, Algorithms Mol. Biol. 18 (2023) 11. doi:10. 1186/S13015-023-00232-4.

[1]

Bille , A survey on tree edit distance and related problems , Theoretical Computer Science 337 ( 2005 ) 217 - 239 . doi:https://doi.org/10.1016/j.tcs. 2004 . 12 .030.

[2]

P. C.

Nowell , The clonal evolution of tumor cell populations: Acquired genetic lability permits stepwise selection of variant sublines and underlies tumor progression ., Science 194 ( 1976 ) 23 - 28 . doi: 10 .1126/science.959840.

[3]

Beerenwinkel ,

C. D.

Greenman ,

Lagergren , Computational cancer biology: An evolutionary perspective , PLOS Computational Biology 12 ( 2016 ) 1 - 12 . doi: 10 .1371/ journal.pcbi. 1004717 .

[4]

Llabrés ,

Rosselló ,

Valiente , A generalized Robinson-Foulds distance for clonal trees, mutation trees, and phylogenetic trees and networks , in: BCB '20: 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics , Virtual Event, USA, September 21 - 24 , 2020 , ACM, 2020 , pp. 13 : 1 - 13 : 10 . doi: 10 .1145/3388440. 3412479.

[5]

Schwartz ,

A. A.

Schäfer , The evolution of tumour phylogenetics: principles and practice , Nature Reviews Genetics 18 ( 2017 ) 213 - 229 . doi: 10 .1038/nrg. 2016 . 170 .

[6]

Bernardini ,

Bonizzoni ,

G. Della

Vedova ,

Patterson , A Rearrangement Distance for Fully-Labelled Trees , in: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019 ), volume 128 of Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl - Leibniz-Zentrum für Informatik , Dagstuhl, Germany, 2019 , pp. 28 : 1 - 28 : 15 . doi: 10 .4230/LIPIcs.CPM. 2019 . 28 .

[7]

DiNardo ,

Tomlinson ,

Ritz , L. Oesper, Distance measures for tumor evolutionary trees , Bioinformatics 36 ( 2019 ) 2090 - 2097 . doi: 10 .1093/bioinformatics/btz869.

[8]

Ciccolella ,

Bernardini ,

Denti ,

Bonizzoni ,

Previtali ,

Della Vedova , Tripletbased similarity score for fully multilabeled trees with poly-occurring labels , Bioinformatics 37 ( 2020 ) 178 - 184 . doi: 10 .1093/bioinformatics/btaa676.

[9]

Bernardini ,

Bonizzoni ,

Gawrychowski , On Two Measures of Distance Between Fully-Labelled Trees , in: I. L. Gørtz , O. Weimann (Eds.), 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020 ), volume 161 of Leibniz International Proceedings in Informatics (LIPIcs), Schloss Dagstuhl - Leibniz-Zentrum für Informatik , 2020 , pp. 6 : 1 - 6 : 16 . doi: 10 .4230/LIPIcs.CPM. 2020 . 6 .

[10]

Karpov ,

Malikic ,

M. K.

Rahman ,

S. C.

Sahinalp , A multi-labeled tree dissimilarity measure for comparing “clonal trees” of tumor progression , Algorithms for Molecular Biology 14 ( 2019 ) 17 . doi: 10 .1186/s13015-019-0152-9.

[11]

Ciccolella ,

G. Della

Vedova ,

Filipović , M. Soto Gomez, Three metaheuristic approaches for tumor phylogeny inference: An experimental comparison , Algorithms 16 ( 2023 ). doi: 10 .3390/a16070333.

[12]

Ferragina ,

Luccio , G. Manzini,

Muthukrishnan , Structuring labeled trees for optimal succinctness, and beyond , in: 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05) , 2005 , pp. 184 - 193 . doi: 10 .1109/SFCS. 2005 . 69 .

[13]

Ferragina ,

Luccio , G. Manzini,

Muthukrishnan , Compressing and indexing labeled trees, with applications , J. ACM 57 ( 2009 ). doi: 10 .1145/1613676.1613680.

[14]

Burrows ,

D. J.

Wheeler ,

A Block

Sorting data Compression Algorithm , Technical Report, DIGITAL System Research Center , 1994 .

[15]

Mantaci ,

Restivo , G. Rosone,

Sciortino , An extension of the Burrows-Wheeler Transform , Theoret. Comput. Sci . 387 ( 2007 ) 298 - 312 . doi: 10 .1016/j.tcs. 2007 . 07 . 014.

[16]

Mantaci ,

Restivo , G. Rosone,

Sciortino , A new combinatorial approach to sequence comparison , Theory Comput. Syst . 42 ( 2008 ) 411 - 429 . doi: 10 .1007/ s00224-007-9078-6.

[17]

Mantaci ,

Restivo ,

Sciortino , Distance measures for biological sequences: Some recent approaches , Int. J. Approx. Reason . 47 ( 2008 ) 109 - 124 . doi: 10 .1016/J.IJAR. 2007 . 03 .011.